欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

记一次Java Rest Service Hang住的经历 博客分类: Java jvmjava 

程序员文章站 2024-03-13 14:38:03
...
背景:
在前后端分开开发的过程中,前段的页面要调用后台Java Rest 服务。一开始,我们的服务是正常的,可是跑了一段时间之后,发现,后台的service没有接受到来自前端的请求,而且当我们直接去请求Rest的时候,发现服务端根本就不能处理这个请求,这个请求就一直hang在哪里,转圈圈。

分析
因为之前也没有分析定位过这中问题。所以有点不知所措。上网查,都说要dump内存,看服务器cpu,内存消耗情况。具体怎么操作也没说清楚。于是联系了一下前同事,给出了一些可以直接操作的命令。
命令一: jstack(查看线程的信息)。
于是用 jstack pid看了一下线程使用情况:
...
"Finalizer" daemon prio=10 tid=0x00007fdb28072800 nid=0x2ca4 in Object.wait() [0x00007fdb2454f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000746496810> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x0000000746496810> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" daemon prio=10 tid=0x00007fdb28070800 nid=0x2ca3 in Object.wait() [0x00007fdb0e04e000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000746496468> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x0000000746496468> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007fdb2800d000 nid=0x2c9d waiting on condition [0x00007fdb30474000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at com.statestreetgx.fcm.core.app.FcmCoreApp.main(FcmCoreApp.java:97)

"VM Thread" prio=10 tid=0x00007fdb2806c800 nid=0x2ca2 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fdb28022800 nid=0x2c9e runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fdb28024800 nid=0x2c9f runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fdb28026000 nid=0x2ca0 runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fdb28028000 nid=0x2ca1 runnable

"VM Periodic Task Thread" prio=10 tid=0x00007fdb285b6800 nid=0x2cae waiting on condition

JNI global references: 412

看到这里的第一刹那,还以为是抛异常了,这还了得。于是去看了tomcat进程的堆栈信息。也是一样的,如下:
...
"Service Thread" daemon prio=10 tid=0x00007f97a4098000 nid=0x1dd8 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f97a4095800 nid=0x1dd7 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f97a4092800 nid=0x1dd6 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f97a4090800 nid=0x1dd5 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f97a406f800 nid=0x1dd4 in Object.wait() [0x00007f978f46f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x00000007008f4fe0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" daemon prio=10 tid=0x00007f97a406d800 nid=0x1dd3 in Object.wait() [0x00007f978f570000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x00000007008f4a30> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f97a4009800 nid=0x1dcd runnable [0x00007f97a9fac000]
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at org.apache.catalina.core.StandardServer.await(StandardServer.java:451)
        at org.apache.catalina.startup.Catalina.await(Catalina.java:777)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:723)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:321)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:455)

"VM Thread" prio=10 tid=0x00007f97a4069000 nid=0x1dd2 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f97a401f800 nid=0x1dce runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f97a4021000 nid=0x1dcf runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f97a4023000 nid=0x1dd0 runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f97a4025000 nid=0x1dd1 runnable

"VM Periodic Task Thread" prio=10 tid=0x00007f97a40aa800 nid=0x1dd9 waiting on condition

JNI global references: 411


因为tomcat是好的,所以也就不在怀疑是这里进程里有异常了。

有一种检查该进程是不是有问题,可以这样做。
1.先ps找到你的进程的PID。
2.然后把PID转换成16进制。Linux: $printf "%x\n" PID; 就能专转成16进制了。
3.到刚才的查出来的堆栈信息里面grep 第二步转出来的16进制,如果能找到,并且在此处blocked住,那么就说明java应用是有问题的。没有就应该不用到这里找了,尝试去别的地方找找看原因吧。

命令二:jmap(查看内存信息)
jstack并没有能帮助到我,我还是不知道怎么办,这个时候我去找一个同事聊了一下,同事也对java内存略懂,于是一直说jmap可以查看内存信息,先看看进程的内存使用情况再说。于是就用上了jmap。jmap不查不知道,一查吓一跳啊。
-bash-4.1$ /server/share/java/bin/jmap -heap 11417
Attaching to process ID 11417, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.79-b02

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 3124756480 (2980.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 777519104 (741.5MB)
   used     = 228470440 (217.88639068603516MB)
   free     = 549048664 (523.6136093139648MB)
   29.38454358543967% used
From Space:
   capacity = 42991616 (41.0MB)
   used     = 42979216 (40.98817443847656MB)
   free     = 12400 (0.0118255615234375MB)
   99.971157167016% used
To Space:
   capacity = 54001664 (51.5MB)
   used     = 0 (0.0MB)
   free     = 54001664 (51.5MB)
   0.0% used
PS Old Generation
   capacity = 130023424 (124.0MB)
   used     = 63940096 (60.97802734375MB)
   free     = 66083328 (63.02197265625MB)
   49.17582850302419% used
PS Perm Generation
   capacity = 71303168 (68.0MB)
   used     = 71057992 (67.76618194580078MB)
   free     = 245176 (0.23381805419921875MB)
   99.65614992029526% used

32711 interned Strings occupying 3615504 bytes.



这回我终于相信自己是找到问题的根本原因了。那就是持久代PG不足,导致的。当时根本就没有去想为什么。脑子里就知道内存不足肯定会出问题的,都99%啦。

我把内存使用情况截给我同事看,我同事呵呵一笑,说这个不能说明问题,要看FGC。 这个时候,我才想起来,java只有在FGC的时候,才对所有的请求都是没有反应的。于是赶紧去找了一下命令。
命令三:jstat
(gc信息查看)

赶紧jstat了一把。
-bash-4.1$ jstat -gc 10626 1000


当前很显然一直在FGC, 能怪一直没有反应。
但是原因,导致FGC的原因是什么呢,还是没有找到。
不管了先把
-XX:PermSize=50m -XX:MaxPermSize=100m
设大一点试试再说吧。
kill 掉进程,再重启。再观察一段时间吧。

这里还有一个不错的文章:
http://ifeve.com/find-bug-online/

--EOF--
相关标签: jvm java