欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

记一次mq无法正常生产消息的事故排查过程

程序员文章站 2022-05-27 08:22:15
继续分析log,奇怪地发现在这两次往mq放数据之前,都有一个奇怪的Restarting Consumer。 ......

 

早上上班后得知,服务费未同步到代理商系统。查看draft_server系统生产环境的log,显示在往rabbitmq推数据时出现异常:no route to host。

2019-07-29 01:30:00,136 info  [pool-13-thread-30] 201154611 (agentprofitproducer.java:32) - 代理商服务费入队
2019-07-29 01:31:01,713 info  [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2021] 201216188 (simplemessagelistenercontainer.java:1453) - restarting consumer: tags=[{}], channel=null, acknowledgemode=auto local queue size=0
2019-07-29 01:31:02,150 info  [pool-13-thread-30] 201216625 (agentprofitserviceimpl.java:182) - [代理商服务费推送]-异常
org.springframework.amqp.amqpioexception: java.net.noroutetohostexception: no route to host (host unreachable)
        at org.springframework.amqp.rabbit.support.rabbitexceptiontranslator.convertrabbitaccessexception(rabbitexceptiontranslator.java:71) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:309) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.cachingconnectionfactory.createconnection(cachingconnectionfactory.java:547) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils$1.createconnection(connectionfactoryutils.java:90) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils.dogettransactionalresourceholder(connectionfactoryutils.java:140) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils.gettransactionalresourceholder(connectionfactoryutils.java:76) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.core.rabbittemplate.doexecute(rabbittemplate.java:1374) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.core.rabbittemplate.execute(rabbittemplate.java:1367) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.core.rabbittemplate.send(rabbittemplate.java:699) ~[spring-rabbit-1.6.1.release.jar:?]
--
        at java.net.abstractplainsocketimpl.doconnect(abstractplainsocketimpl.java:350) ~[?:1.8.0_191]
        at java.net.abstractplainsocketimpl.connecttoaddress(abstractplainsocketimpl.java:206) ~[?:1.8.0_191]
        at java.net.abstractplainsocketimpl.connect(abstractplainsocketimpl.java:188) ~[?:1.8.0_191]
        at java.net.sockssocketimpl.connect(sockssocketimpl.java:392) ~[?:1.8.0_191]
        at java.net.socket.connect(socket.java:589) ~[?:1.8.0_191]
        at com.rabbitmq.client.impl.framehandlerfactory.create(framehandlerfactory.java:32) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:811) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:725) ~[amqp-client-3.6.3.jar:?]
        at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:296) ~[spring-rabbit-1.6.1.release.jar:?]
        ... 21 more
2019-07-29 01:31:02,150 info  [pool-13-thread-30] 201216625 (agentprofitserviceimpl.java:184) - 代理商服务费推送结束2019-07-29t01:31:02.150+0800

 

打开vpn连接到生产环境,用本地test程序尝试往生产的mq推数据,发现正常。接下来,rpc调用生产的服务费推送服务,再看生产log,mq依然有问题。不过这次是sockettimeoutexception。

2019-07-29 13:57:23,514 info  [pool-13-thread-38] 245997989 (agentprofitproducer.java:32) - 代理商服务费入队
2019-07-29 13:57:47,563 warn  [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2621] 246022038 (simplemessagelistenercontainer.java:1462) - consumer raised exception, processing can restartif the connection factory supports it
2019-07-29 13:57:47,564 info  [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2621] 246022039 (simplemessagelistenercontainer.java:1453) - restarting consumer: tags=[{}], channel=null, acknowledgemode=auto local queue size=0
2019-07-29 14:00:23,636 info  [pool-13-thread-38] 246178111 (agentprofitserviceimpl.java:182) - [代理商服务费推送]-异常
org.springframework.amqp.amqpioexception: java.net.sockettimeoutexception: connect timed out
        at org.springframework.amqp.rabbit.support.rabbitexceptiontranslator.convertrabbitaccessexception(rabbitexceptiontranslator.java:71) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:309) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.cachingconnectionfactory.createconnection(cachingconnectionfactory.java:547) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils$1.createconnection(connectionfactoryutils.java:90) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils.dogettransactionalresourceholder(connectionfactoryutils.java:140) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils.gettransactionalresourceholder(connectionfactoryutils.java:76) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.core.rabbittemplate.doexecute(rabbittemplate.java:1374) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.core.rabbittemplate.execute(rabbittemplate.java:1367) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.core.rabbittemplate.send(rabbittemplate.java:699) ~[spring-rabbit-1.6.1.release.jar:?]
--
        at java.net.abstractplainsocketimpl.doconnect(abstractplainsocketimpl.java:350) ~[?:1.8.0_191]
        at java.net.abstractplainsocketimpl.connecttoaddress(abstractplainsocketimpl.java:206) ~[?:1.8.0_191]
        at java.net.abstractplainsocketimpl.connect(abstractplainsocketimpl.java:188) ~[?:1.8.0_191]
        at java.net.sockssocketimpl.connect(sockssocketimpl.java:392) ~[?:1.8.0_191]
        at java.net.socket.connect(socket.java:589) ~[?:1.8.0_191]
        at com.rabbitmq.client.impl.framehandlerfactory.create(framehandlerfactory.java:32) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:811) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.connectionfactory.newconnection(connectionfactory.java:725) ~[amqp-client-3.6.3.jar:?]
        at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:296) ~[spring-rabbit-1.6.1.release.jar:?]
        ... 21 more
2019-07-29 14:00:23,636 info  [pool-13-thread-38] 246178111 (agentprofitserviceimpl.java:184) - 代理商服务费推送结束2019-07-29t14:00:23.636+0800
2019-07-29 14:00:47,648 warn  [org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer#0-2622] 246202123 (simplemessagelistenercontainer.java:1462) - consumer raised exception, processing can restartif the connection factory supports it
org.springframework.amqp.amqpioexception: java.net.sockettimeoutexception: connect timed out
        at org.springframework.amqp.rabbit.support.rabbitexceptiontranslator.convertrabbitaccessexception(rabbitexceptiontranslator.java:71) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.abstractconnectionfactory.createbareconnection(abstractconnectionfactory.java:309) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.cachingconnectionfactory.createconnection(cachingconnectionfactory.java:547) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils$1.createconnection(connectionfactoryutils.java:90) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils.dogettransactionalresourceholder(connectionfactoryutils.java:140) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.connection.connectionfactoryutils.gettransactionalresourceholder(connectionfactoryutils.java:76) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.listener.blockingqueueconsumer.start(blockingqueueconsumer.java:472) ~[spring-rabbit-1.6.1.release.jar:?]
        at org.springframework.amqp.rabbit.listener.simplemessagelistenercontainer$asyncmessageprocessingconsumer.run(simplemessagelistenercontainer.java:1280) [spring-rabbit-1.6.1.release.jar:?]

 

继续分析log,奇怪地发现在这两次往mq放数据之前,都有一个奇怪的restarting consumer。

draft_server不仅是mq生产者,还是mq消费者。登陆rabbitmq管理控制台,队列显示的竟然是... no consumers ...。那么,问题也许出现在这里。服务启动后应该自动注册的,看来上周五上线未正常发版(人为手动删掉consumer或手工创建队列的几率不大)。

于是,申请让运维同事重新发版,jenkins构建完毕,服务重启,发现队列有消费者了。
然后,本地再次rpc调用服务器上的那个服务,一切正常,mq可以正常生产消息了。