欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

记录一次springboot自动任务线上突然不执行问题排查过程

程序员文章站 2022-07-15 08:06:11
...

      其实问题排查过程不是很复杂,只是我们项目要远程,并且服务器是win,连jdk自带的一些工具都没有,arthas在windowns上兼容性也不好,浪费了很多时间。

1、简述

       首先,springboot的自动任务注解@EnableScheduling和@Scheduled底层原理是默认是单线程的,你也可以配置用自定义的线程池。所以,线上有自动任务突然不执行,肯定是有自动任务线程发生阻塞或者死锁,导致卡住了,其他自动任务都无法执行,对外呈现一个程序假死的情况。

       首先简单说下多线程的配置方式,不是本文重点,简单记录下:

import org.springframework.boot.autoconfigure.batch.BatchProperties;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.scheduling.annotation.SchedulingConfigurer;
import org.springframework.scheduling.config.ScheduledTaskRegistrar;
import java.lang.reflect.Method;
import java.util.concurrent.Executors;
@Configuration
public class ScheduleConfig implements SchedulingConfigurer {
  @Override
  public void configureTasks(ScheduledTaskRegistrar taskRegistrar) {
    Method[] methods = BatchProperties.Job.class.getMethods();
    int defaultPoolSize = 3;
    int corePoolSize = 0;
    if (methods != null && methods.length > 0) {
      for (Method method : methods) {
        Scheduled annotation = method.getAnnotation(Scheduled.class);
        if (annotation != null) {
          corePoolSize++;
        }
      }
      if (defaultPoolSize > corePoolSize)
        corePoolSize = defaultPoolSize;
    }
    taskRegistrar.setScheduler(Executors.newScheduledThreadPool(corePoolSize));
  }
}

           由于这个问题重启后就很难复现了,所以我没有立即重启,而是想办法通过jps jmap jstack这些好用的工具查看线程堆栈信息,看是不是发生死锁或者其他问题;可是坑爹的是,我们这个项目只有jre运行环境,所以我把相同版本的jdk下的jps jmap jhat jstack这些exe文件直接拷贝到jre的bin目录下,发现可以执行了。

2、jstack查看线程

首先查看下tomcat的进程号,然后

jstack pid

"scheduling-1" #287 prio=5 os_prio=0 tid=0x00000000259c3800 nid=0x138c8 runnable [0x000000004749d000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at oracle.net.ns.Packet.receive(Packet.java:300)
	at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
	at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
	at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
	at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
	at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
	at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
	at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
	at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
	at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:380)
	at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:760)
	at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:401)
	at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:546)
	at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:236)
	at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
	at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
	at java.sql.DriverManager.getConnection(DriverManager.java:664)
	at java.sql.DriverManager.getConnection(DriverManager.java:247)
	at com.hikvision.ctm05cqfskdj.module.service.impl.SyncPersonInfoServiceImp.getNewConnection(SyncPersonInfoServiceImp.java:189)
	at com.hikvision.ctm05cqfskdj.module.service.impl.SyncPersonInfoServiceImp.syncPerson(SyncPersonInfoServiceImp.java:86)
	at com.hikvision.ctm05cqfskdj.module.task.AutoSyncPersonToIscTask.autoSyncPersonTask(AutoSyncPersonToIscTask.java:37)
	- locked <0x000000072a8cca68> (a com.hikvision.ctm05cqfskdj.module.task.AutoSyncPersonToIscTask$$EnhancerBySpringCGLIB$$b642cbea)
	at sun.reflect.GeneratedMethodAccessor3553.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:752)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:88)
	at org.springframework.cloud.sleuth.instrument.scheduling.TraceSchedulingAspect.traceBackgroundThread(TraceSchedulingAspect.java:70)
	at sun.reflect.GeneratedMethodAccessor2405.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:644)
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:633)
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:93)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688)
	at com.hikvision.ctm05cqfskdj.module.task.AutoSyncPersonToIscTask$$EnhancerBySpringCGLIB$$ad984c90.autoSyncPersonTask(<generated>)
	at sun.reflect.GeneratedMethodAccessor3553.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
	at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- <0x0000000721613c10> (a java.util.concurrent.ThreadPoolExecutor$Worker)

关于堆栈查看,很简单,关注以下几个点,另外,这里给出一个堆栈分析网站,可视化的,很牛逼:https://fastthread.io/
 

记录一次springboot自动任务线上突然不执行问题排查过程

记录一次springboot自动任务线上突然不执行问题排查过程

      堆栈信息很多,忽略掉其他的,根据线程名称查scheduling-1,也没发现发现死锁,scheduling-1线程阻塞在了java.net.SocketInputStream.socketRead0(Native Method)这个本地方法,并且状态是RUNABLE,初步判断是这个地方阻塞,查看堆栈信息,发现代码行:

at com.hikvision.ctm05cqfskdj.module.service.impl.SyncPersonInfoServiceImp.getNewConnection(SyncPersonInfoServiceImp.java:189)
/**
     * 每次人员同步都去清空连接池中的连接(不管是否有效),拿最新连接去执行自动任务
     * */
    public void getNewConnection(){
        log.info("开始获取最新连接,放入连接池");
        poolClear();
        LinkedList<Connection> pool = JdbcHelper.getInstance().getPool();
        try {
            log.info("解密前username={},password={}",username,password);
            String decryptUserName = ConfigEncryptUtil.decrypt(username);
            String decryptPassword = ConfigEncryptUtil.decrypt(password);
            for(int i = 0; i < JDBC_POOL_SIZE; i++) {
                Connection conn = DriverManager.getConnection(url, decryptUserName, decryptPassword);
                if (null != conn && pool.size() < JDBC_POOL_SIZE){
                    pool.push(conn);
                }
            }
        } catch (Exception e) {
            log.error("获取连接异常.e={},size={}",e.getMessage(),pool.size());
        }
        log.info("获取连接结束。size={}",pool.size());
    }


通过日志发现,"开始获取最新连接,放入连接池"这行日志没有与之对应的“获取连接结束”,说明这个方法阻塞了。再进一步定位,很容易就发现:

Connection conn = DriverManager.getConnection(url, decryptUserName, decryptPassword);

这行代码可能有问题,点进去看:

private static volatile int loginTimeout = 0;

/**
     * Sets the maximum time in seconds that a driver will wait
     * while attempting to connect to a database once the driver has
     * been identified.
     *
     * @param seconds the login time limit in seconds; zero means there is no limit
     * @see #getLoginTimeout
     */
    public static void setLoginTimeout(int seconds) {
        loginTimeout = seconds;
    }

注意这个注释:登录时间限制以秒为单位;零表示没有限制 ,而默认值刚好是0;所以基本定位ok!

3、查看GC情况

gc情况正常,CMS垃圾回收算是比较高效的并发收集器,不过最新的ZGC是未来的趋势,或许以后都不需要gc调优了!

Attaching to process ID 61008, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.181-b13

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 3221225472 (3072.0MB)
   NewSize                  = 536870912 (512.0MB)
   MaxNewSize               = 536870912 (512.0MB)
   OldSize                  = 536870912 (512.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 483196928 (460.8125MB)
   used     = 423850536 (404.21537017822266MB)
   free     = 59346392 (56.597129821777344MB)
   87.71796992881544% used
Eden Space:
   capacity = 429522944 (409.625MB)
   used     = 417381920 (398.0464172363281MB)
   free     = 12141024 (11.578582763671875MB)
   97.17337009126106% used
From Space:
   capacity = 53673984 (51.1875MB)
   used     = 6468616 (6.168952941894531MB)
   free     = 47205368 (45.01854705810547MB)
   12.051678518963675% used
To Space:
   capacity = 53673984 (51.1875MB)
   used     = 0 (0.0MB)
   free     = 53673984 (51.1875MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 1133961216 (1081.4296875MB)
   used     = 372137864 (354.89832305908203MB)
   free     = 761823352 (726.531364440918MB)
   32.817512517112405% used

90956 interned Strings occupying 9060464 bytes.

4、原因:socketRead0阻塞

       现在的Java EE 系统通常都依赖于远程服务,需要通过网络请求获取远程服务。不管是使用什么协议(HTTP,HTTPS,JDBC 等)请求远程服务,Java VM最终都将委托给Socket去实现。首先会通过connect方法与远程服务建立连接,连接建立成功后就可以 write/read 与远程服务实现数据通信。Socket.connect()、Socket.write()、Socket.read() 方法调用都被称为阻塞IO调用。这些调用由于网络延时、服务器响应缓慢等问题往往会导致线程挂起。

5、何时出现阻塞

通过日志发现,阻塞的时机发生在连接oracle的时候如果报错:

ORA-00257: 存檔器錯誤. 連線僅限內部, 直到釋出為止

出现这个错误,才发现了自动任务不执行的情况,表明这个错误会导致获取连接被阻塞;这个错误查阅资料发现是:

记录一次springboot自动任务线上突然不执行问题排查过程

 

5、解决方案

//单位s:设置10分钟
DriverManager.setLoginTimeout(10 * 60);
Connection conn = DriverManager.getConnection(url, decryptUserName, decryptPassword);

6、验证是否生效

只能改了之后,看后面是否还出现自动任务不执行的情况了。一个月后评论给出验证结果!

相关标签: java核心知识