[线上问题] springboot 上线后莫名假死 前端请求全部pending 原因竟然是Jedis 有连接泄露
前提
问题发生在 springboot + shiro + shiro-redis 前后端分离环境
所有前端请求全部卡死,全部处于 pending , 直到超时错误 , 日志完整没问题无报错 spring boot中没有看到请求记录,但是还在执行定时任务,能看到定时任务打印日志,马上解决怎么办?
思路
赶紧百度 pending , springboot , 假死 愣是没查出一个能实际解决问题的来 ,
然后只能死马当活马医 , 按照这个老哥的排错步骤 来搞
https://blog.csdn.net/mifffy_java/article/details/97900888
结果第一步就不一样,太难了,我这边多次请求就有多个CLOSEWAIT 上升 ,说明TCP连接是正常的,那就是应用端有问题了
再硬着头皮搞,排查cpu 内存等问题,看看是不是哪里死循环了 或者 内存泄露 , 但是程序依然是表象稳定,没有任何的占满等情况
搞得心态有点崩了 然后接着用 jstack 把dump 弄出来分析下
总算有点眉目了 ,发现大量的等待,都是在shiro-redis 中的redis 获取连接 获取不到 果断加上超时时间 , 以为这样就结束了
这个地方可以先看 死锁状态
Deadlock
https://blog.csdn.net/zxh87/article/details/52137335
然后根据自己的代码包名搜索
"http-nio-9011-exec-174" #2368 daemon prio=5 os_prio=0 tid=0x00007fb9840d9000 nid=0x942 waiting on condition [0x00007fb850bc9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f1d8af10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:590)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:441)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:362)
at redis.clients.util.Pool.getResource(Pool.java:49)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226)
at org.crazycake.shiro.RedisManager.getJedis(RedisManager.java:35)
at org.crazycake.shiro.WorkAloneRedisManager.get(WorkAloneRedisManager.java:50)
at org.crazycake.shiro.RedisSessionDAO.doReadSession(RedisSessionDAO.java:152)
at org.apache.shiro.session.mgt.eis.AbstractSessionDAO.readSession(AbstractSessionDAO.java:168)
at org.apache.shiro.session.mgt.DefaultSessionManager.retrieveSessionFromDataSource(DefaultSessionManager.java:236)
at org.apache.shiro.session.mgt.DefaultSessionManager.retrieveSession(DefaultSessionManager.java:222)
at org.apache.shiro.session.mgt.AbstractValidatingSessionManager.doGetSession(AbstractValidatingSessionManager.java:118)
at org.apache.shiro.session.mgt.AbstractNativeSessionManager.lookupSession(AbstractNativeSessionManager.java:148)
at org.apache.shiro.session.mgt.AbstractNativeSessionManager.getSession(AbstractNativeSessionManager.java:140)
at org.apache.shiro.mgt.SessionsSecurityManager.getSession(SessionsSecurityManager.java:156)
at org.apache.shiro.mgt.DefaultSecurityManager.resolveContextSession(DefaultSecurityManager.java:460)
at org.apache.shiro.mgt.DefaultSecurityManager.resolveSession(DefaultSecurityManager.java:446)
at org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:342)
at org.apache.shiro.subject.Subject$Builder.buildSubject(Subject.java:845)
at org.apache.shiro.web.subject.WebSubject$Builder.buildWebSubject(WebSubject.java:148)
at org.apache.shiro.web.servlet.AbstractShiroFilter.createSubject(AbstractShiroFilter.java:292)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:359)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.xxxxxxxx.config.Filter.CorsFilter.doFilter(CorsFilter.java:40)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:151)
at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:81)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <0x00000000e99b14f0> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
结果事实是 过了几个小时 全部接口不再假死 全部报错
java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:448)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:362)
at redis.clients.util.Pool.getResource(Pool.java:49)
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226)
at org.crazycake.shiro.RedisManager.getJedis(RedisManager.java:38)
at org.crazycake.shiro.common.WorkAloneRedisManager.get(WorkAloneRedisManager.java:51)
at org.crazycake.shiro.RedisSessionDAO.doReadSession(RedisSessionDAO.java:202)
at org.apache.shiro.session.mgt.eis.AbstractSessionDAO.readSession(AbstractSessionDAO.java:168)
at org.apache.shiro.session.mgt.DefaultSessionManager.retrieveSessionFromDataSource(DefaultSessionManager.java:236)
at org.apache.shiro.session.mgt.DefaultSessionManager.retrieveSession(DefaultSessionManager.java:222)
at org.apache.shiro.session.mgt.AbstractValidatingSessionManager.doGetSession(AbstractValidatingSessionManager.java:118)
at org.apache.shiro.session.mgt.AbstractNativeSessionManager.lookupSession(AbstractNativeSessionManager.java:148)
at org.apache.shiro.session.mgt.AbstractNativeSessionManager.getSession(AbstractNativeSessionManager.java:140)
at org.apache.shiro.mgt.SessionsSecurityManager.getSession(SessionsSecurityManager.java:156)
at org.apache.shiro.mgt.DefaultSecurityManager.resolveContextSession(DefaultSecurityManager.java:460)
at org.apache.shiro.mgt.DefaultSecurityManager.resolveSession(DefaultSecurityManager.java:446)
at org.apache.shiro.mgt.DefaultSecurityManager.createSubject(DefaultSecurityManager.java:342)
at org.apache.shiro.subject.Subject$Builder.buildSubject(Subject.java:845)
at org.apache.shiro.web.subject.WebSubject$Builder.buildWebSubject(WebSubject.java:148)
at org.apache.shiro.web.servlet.AbstractShiroFilter.createSubject(AbstractShiroFilter.java:292)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:359)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.xxxxxxxx.config.Filter.CorsFilter.doFilter(CorsFilter.java:40)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:151)
at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:81)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
而搜索 NoSuchElementException: Timeout waiting for idle object
很多解决方案都是说: 没有连接了 需要去加大连接
可是这个仅仅是表象 , 实际很明显出现了连接泄露 , 而功能马上就要上线了,竟然还有这种的问题 心态崩了呀 , 已经没有多少时间去验证了 , 只能临时加大最大连接限制 先把功能上线
然后接着排查 ,通过在spring 中添加定时任务 并打印 获取 shiro-redis 的redisManager 连接池可用连接数 , 同时调小连接数 反向测试:
config.setMaxTotal(1);
config.setMaxIdle(1);
redisManager.setJedisPoolConfig(config);
@Autowired
private RedisManager redisManager;
@Scheduled(cron = "*/10 * * * * ?")
public void test() {
JedisPool jedisPool = redisManager.getJedisPool();
if (jedisPool == null) {
loggger.info("jedis pool is null ");
} else {
loggger.info("getNumWaiters : {},getNumIdle: {},getNumActive:{},getMeanBorrowWaitTimeMillis :{},getMaxBorrowWaitTimeMillis:{}",
jedisPool.getNumWaiters(), jedisPool.getNumIdle(), jedisPool.getNumActive(), jedisPool.getMeanBorrowWaitTimeMillis(), jedisPool.getMaxBorrowWaitTimeMillis());
}
}
并且通过jmeter 并发测试 20个线程同时请求,愣是一点问题没有
wtf?
发到测试环境在实际使用中测试 果然这次仅仅一个小时就出现了问题 , 报连接获取不到
在正常没有访问的时候日志是这样的
而出现问题后日志是这样的
很明显,在后面没有请求进来,却有一个active 1 出现 妥妥的连接泄露
虽然知道了这么多还是没有办法知道是哪里导致了内存泄露
搜索 jedis 连接泄露
发现新大陆 原来早就有人提出了这个问题
https://github.com/redis/jedis/issues/1920
在 2.9.x jedis 中
@Override
public void close() {
if (dataSource != null) {
if (client.isBroken()) {
this.dataSource.returnBrokenResource(this);
} else {
this.dataSource.returnResource(this);
}
this.dataSource = null; // 这行
} else {
super.close();
}
}
jedispool 中
public Jedis getResource() {
Jedis jedis = (Jedis)super.getResource();
jedis.setDataSource(this);
return jedis;
}
两个线程A , B 在高并发情况下连接池泄露
A 使用完成后归还到池中,但是还没有将dataSource 置空
B 因为A已经归还池中,此时拿到时间片执行
A 将dataSource 置空
B 操作完成后close ,dataSource 为空 不能归还,所以只能调用父类方法 关闭这次连接,而连接池中依然维护这个Jedis 表象就是Active ,由于受Max限制,所以在一定时间累积下就会出现假死现象,晚上说的加大连接配置,只不过是将这个时间延长,一旦这种连接达到最大就会出现假死或者无法获取对象
以上,就完全符合之前的全部现象了 , 出现这种情况需要两个线程前后取同一个jedis 并且刚好在前一个回收 后一个取出去 并且在后一个设置之后执行置空操作 才会出现这个问题,所以还需要时间片刚好分给这两个线程让他们这样执行才会出现这样的问题
最后解决方案将jedis 升级 2.10.2
本文地址:https://blog.csdn.net/qq_35530042/article/details/109293171