记一次mq无法正常生产消息的事故排查过程
程序员文章站
2023-08-29 19:14:50
继续分析log,奇怪地发现在这两次往mq放数据之前,都有一个奇怪的Restarting Consumer。 ......
早上上班后得知,服务费未同步到代理商系统。查看draft_server系统生产环境的log,显示在往rabbitmq推数据时出现异常:no route to host。
2019-07-29 01:30:00,136 info [pool-13-thread-30] 201154611 (agentprofitproducer.java:32) - 代理商服务费入队 2019-07-29 01:31:01,713 info [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2021] 201216188 (simplemessagelistenercontainer.java:1453) - restarting consumer: tags=[{}], channel=null, acknowledgemode=auto local queue size=0 2019-07-29 01:31:02,150 info [pool-13-thread-30] 201216625 (agentprofitserviceimpl.java:182) - [代理商服务费推送]-异常 org.springframework.amqp.amqpioexception: java.net.noroutetohostexception: no route to host (host unreachable) at org.springframework.amqp.rabbit.support.rabbitexceptiontranslator.convertrabbitaccessexception(rabbitexceptiontranslator.java:71) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:309) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.cachingconnectionfactory.createconnection(cachingconnectionfactory.java:547) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils$1.createconnection(connectionfactoryutils.java:90) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils.dogettransactionalresourceholder(connectionfactoryutils.java:140) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils.gettransactionalresourceholder(connectionfactoryutils.java:76) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.core.rabbittemplate.doexecute(rabbittemplate.java:1374) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.core.rabbittemplate.execute(rabbittemplate.java:1367) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.core.rabbittemplate.send(rabbittemplate.java:699) ~[spring-rabbit-1.6.1.release.jar:?] -- at java.net.abstractplainsocketimpl.doconnect(abstractplainsocketimpl.java:350) ~[?:1.8.0_191] at java.net.abstractplainsocketimpl.connecttoaddress(abstractplainsocketimpl.java:206) ~[?:1.8.0_191] at java.net.abstractplainsocketimpl.connect(abstractplainsocketimpl.java:188) ~[?:1.8.0_191] at java.net.sockssocketimpl.connect(sockssocketimpl.java:392) ~[?:1.8.0_191] at java.net.socket.connect(socket.java:589) ~[?:1.8.0_191] at com.rabbitmq.client.impl.framehandlerfactory.create(framehandlerfactory.java:32) ~[amqp-client-3.6.3.jar:?] at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:811) ~[amqp-client-3.6.3.jar:?] at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:725) ~[amqp-client-3.6.3.jar:?] at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:296) ~[spring-rabbit-1.6.1.release.jar:?] ... 21 more 2019-07-29 01:31:02,150 info [pool-13-thread-30] 201216625 (agentprofitserviceimpl.java:184) - 代理商服务费推送结束2019-07-29t01:31:02.150+0800
打开vpn连接到生产环境,用本地test程序尝试往生产的mq推数据,发现正常。接下来,rpc调用生产的服务费推送服务,再看生产log,mq依然有问题。不过这次是sockettimeoutexception。
2019-07-29 13:57:23,514 info [pool-13-thread-38] 245997989 (agentprofitproducer.java:32) - 代理商服务费入队 2019-07-29 13:57:47,563 warn [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2621] 246022038 (simplemessagelistenercontainer.java:1462) - consumer raised exception, processing can restartif the connection factory supports it 2019-07-29 13:57:47,564 info [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2621] 246022039 (simplemessagelistenercontainer.java:1453) - restarting consumer: tags=[{}], channel=null, acknowledgemode=auto local queue size=0 2019-07-29 14:00:23,636 info [pool-13-thread-38] 246178111 (agentprofitserviceimpl.java:182) - [代理商服务费推送]-异常 org.springframework.amqp.amqpioexception: java.net.sockettimeoutexception: connect timed out at org.springframework.amqp.rabbit.support.rabbitexceptiontranslator.convertrabbitaccessexception(rabbitexceptiontranslator.java:71) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:309) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.cachingconnectionfactory.createconnection(cachingconnectionfactory.java:547) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils$1.createconnection(connectionfactoryutils.java:90) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils.dogettransactionalresourceholder(connectionfactoryutils.java:140) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils.gettransactionalresourceholder(connectionfactoryutils.java:76) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.core.rabbittemplate.doexecute(rabbittemplate.java:1374) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.core.rabbittemplate.execute(rabbittemplate.java:1367) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.core.rabbittemplate.send(rabbittemplate.java:699) ~[spring-rabbit-1.6.1.release.jar:?] -- at java.net.abstractplainsocketimpl.doconnect(abstractplainsocketimpl.java:350) ~[?:1.8.0_191] at java.net.abstractplainsocketimpl.connecttoaddress(abstractplainsocketimpl.java:206) ~[?:1.8.0_191] at java.net.abstractplainsocketimpl.connect(abstractplainsocketimpl.java:188) ~[?:1.8.0_191] at java.net.sockssocketimpl.connect(sockssocketimpl.java:392) ~[?:1.8.0_191] at java.net.socket.connect(socket.java:589) ~[?:1.8.0_191] at com.rabbitmq.client.impl.framehandlerfactory.create(framehandlerfactory.java:32) ~[amqp-client-3.6.3.jar:?] at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:811) ~[amqp-client-3.6.3.jar:?] at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:725) ~[amqp-client-3.6.3.jar:?] at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:296) ~[spring-rabbit-1.6.1.release.jar:?] ... 21 more 2019-07-29 14:00:23,636 info [pool-13-thread-38] 246178111 (agentprofitserviceimpl.java:184) - 代理商服务费推送结束2019-07-29t14:00:23.636+0800 2019-07-29 14:00:47,648 warn [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2622] 246202123 (simplemessagelistenercontainer.java:1462) - consumer raised exception, processing can restartif the connection factory supports it org.springframework.amqp.amqpioexception: java.net.sockettimeoutexception: connect timed out at org.springframework.amqp.rabbit.support.rabbitexceptiontranslator.convertrabbitaccessexception(rabbitexceptiontranslator.java:71) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:309) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.cachingconnectionfactory.createconnection(cachingconnectionfactory.java:547) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils$1.createconnection(connectionfactoryutils.java:90) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils.dogettransactionalresourceholder(connectionfactoryutils.java:140) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.connection.connectionfactoryutils.gettransactionalresourceholder(connectionfactoryutils.java:76) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.listener.blockingqueueconsumer.start(blockingqueueconsumer.java:472) ~[spring-rabbit-1.6.1.release.jar:?] at org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer$asyncmessageprocessingconsumer.run(simplemessagelistenercontainer.java:1280) [spring-rabbit-1.6.1.release.jar:?]
继续分析log,奇怪地发现在这两次往mq放数据之前,都有一个奇怪的restarting consumer。
draft_server不仅是mq生产者,还是mq消费者。登陆rabbitmq管理控制台,队列显示的竟然是... no consumers ...。那么,问题也许出现在这里。服务启动后应该自动注册的,看来上周五上线未正常发版(人为手动删掉consumer或手工创建队列的几率不大)。
于是,申请让运维同事重新发版,jenkins构建完毕,服务重启,发现队列有消费者了。
然后,本地再次rpc调用服务器上的那个服务,一切正常,mq可以正常生产消息了。