记一次mongodb从亚马逊云迁移到微软云 博客分类: 杂谈mongodb mongodb
程序员文章站
2024-03-17 21:17:28
...
从aws上把mongodb集群弄到微软云上之后,发现mongos里各种报错
2016-07-14T16:42:10.779+0800 I NETWORK [LockPinger] Socket recv() timeout 10.0.0.6:30001
2016-07-14T16:42:10.779+0800 I NETWORK [LockPinger] SocketException: remote: 10.0.0.6:30001 error: 9001 socket exception [RECV_TIMEOUT] server [10.0.0.6:30001]
2016-07-14T16:42:10.779+0800 I NETWORK [LockPinger] DBClientCursor::init call() failed
2016-07-14T16:42:12.595+0800 I NETWORK [LockPinger] scoped connection to realsightback4:30001,realsightback3:30001,realsightback2:30001 not being returned to the pool
2016-07-14T16:42:12.595+0800 W SHARDING [LockPinger] distributed lock pinger'realsightback4:30001,realsightback3:30001,realsightback2:30001/realsightback1:30000:1468133855:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection::update prepare failed: realsightback4:30001 (10.0.0.6) failed:10276 DBClientBase::findN: transport error: realsightback4:30001 ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }
一开始看socket超时,以为是网络问题,用iperf排查没有问题,于是查看mongodb源码,感觉是config server的 { getlasterror: 1, fsync: 1 }命令超时,于是直接连接上config server执行该命令,三四十秒才返回,用iostat看,果然IO负载很高,于是多挂了一块盘,把不同shard的数据目录分开,问题解决了
原来aws上就只有一块盘,看来azure还是不如aws啊
引用
2016-07-14T16:42:10.779+0800 I NETWORK [LockPinger] Socket recv() timeout 10.0.0.6:30001
2016-07-14T16:42:10.779+0800 I NETWORK [LockPinger] SocketException: remote: 10.0.0.6:30001 error: 9001 socket exception [RECV_TIMEOUT] server [10.0.0.6:30001]
2016-07-14T16:42:10.779+0800 I NETWORK [LockPinger] DBClientCursor::init call() failed
2016-07-14T16:42:12.595+0800 I NETWORK [LockPinger] scoped connection to realsightback4:30001,realsightback3:30001,realsightback2:30001 not being returned to the pool
2016-07-14T16:42:12.595+0800 W SHARDING [LockPinger] distributed lock pinger'realsightback4:30001,realsightback3:30001,realsightback2:30001/realsightback1:30000:1468133855:1804289383' detected an exception while pinging. :: caused by :: SyncClusterConnection::update prepare failed: realsightback4:30001 (10.0.0.6) failed:10276 DBClientBase::findN: transport error: realsightback4:30001 ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }
一开始看socket超时,以为是网络问题,用iperf排查没有问题,于是查看mongodb源码,感觉是config server的 { getlasterror: 1, fsync: 1 }命令超时,于是直接连接上config server执行该命令,三四十秒才返回,用iostat看,果然IO负载很高,于是多挂了一块盘,把不同shard的数据目录分开,问题解决了
原来aws上就只有一块盘,看来azure还是不如aws啊