欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  移动技术

Android ANR问题 -- Input超时实战问题解析上

程序员文章站 2022-05-14 08:26:01
在前面我们简诉了android input超时的原因,我们了解到input系统分发input的事件时如果有5s超时会触发应用anr。在实际开发测试中,我们也会经常遇到input超时导致的anr问题,...

在前面我们简诉了android input超时的原因,我们了解到input系统分发input的事件时如果有5s超时会触发应用anr。在实际开发测试中,我们也会经常遇到input超时导致的anr问题,那么现在我们就以实际问题分析一下input超时的anr问题,描述一下实际开发测试中我们应该如何定位问题的原因。本文基于androido(8.1)系统。

实战一:
input dispatching timed out (waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.),手动模拟复现无窗口,有应用类型的anr问题。直接上log


05-28 11:24:49.798 ?1668 ?3141 i activitymanager: start u0 {act=android.intent.action.main cat=[android.intent.category.launcher] flg=0x10200000 cmp=com.example.anrtestdemo/.mainactivity} from uid 10007 on display 0 //从手机launcher界面点击启动测试demo
//我们可以从11:24:55.536 ?1668 ?2070 i inputdispatcher中推算这里应该发生一次input事件,而实际测试中,我却是在这个时间左右多次点击back键
05-28 11:24:49.849 23293 23293 d abm ? ? : onstart //测试demo mainactivity执行onstart方法
05-28 11:24:55.536 ?1668 ?2070 i inputdispatcher: application is not responding: appwindowtoken{d262c54 token=token{cca1066 activityrecord{2f3c0c1 u0 com.example.anrtestdemo/.mainactivity t18}}}. ?it has been 5018.6ms since event, 5006.2ms since wait started. ?reason: waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.//input anr超时第一时间点
05-28 11:24:55.642 ?1668 ?1713 i am_anr ?: [0,23293,com.example.anrtestdemo,952680262,input dispatching timed out (waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]//android activitymanagerservice anr log打印,本例am_anr log和inputdispatcher之间相差时间较少,有时也会出现很大的时间差
05-28 11:24:59.849 23293 23293 d abm ? ? : onstart return time is: 10000 //模拟耗时的操作
05-28 11:24:59.852 23293 23293 d abm ? ? : onresume //测试demo mainactivity执行onresume方法
05-28 11:24:59.852 23293 23293 i am_on_resume_called: [0,com.example.anrtestdemo.mainactivity,launch_activity]
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: anr in com.example.anrtestdemo (com.example.anrtestdemo/.mainactivity)
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: pid: 23293
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: reason: input dispatching timed out (waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: load: 5.72 / 5.5 / 5.29
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: cpu usage from 98992ms to 0ms ago (2018-05-28 11:23:16.593 to 2018-05-28 11:24:55.585):
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: ? 14% 1668/system_server: 8.8% user + 5.2% kernel / faults: 42792 minor 28 major
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: ? 9.1% 558/surfaceflinger: 3.5% user + 5.6% kernel / faults: 1609 minor 2 major
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: ? 5.7% 878/adbd: 0.5% user + 5.2% kernel / faults: 2680 minor
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: ? 5.4% 2490/com.android.systemui: 3.6% user + 1.8% kernel / faults: 15618 minor 99 major
05-28 11:25:00.420 ?1668 ?1713 e activitymanager: ? 3.7% 8331/com.blackberry.blackberrylauncher: 2.8% user + 0.8% kernel / faults: 37990 minor 9 major

从adb log中,我们可以知道anr类型,并且依据该类型的anr,具体分析trace log

dalvik threads (12):
"main" prio=5 tid=1 runnable
  | group="main" scount=0 dscount=0 obj=0x757f91f8 self=0x4832e3fa00
  | systid=23293 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
  | state=r schedstat=( 16106974966 17948853 313 ) utm=1450 stm=160 core=5 hz=100
  | stack=0x5c73865000-0x5c73867000 stacksize=8mb
  | held mutexes= "mutator lock"(shared held)
  at com.example.anrtestdemo.mainactivity.onstart(mainactivity.java:22)
  at android.app.instrumentation.callactivityonstart(instrumentation.java:1249)
  at android.app.activity.performstart(activity.java:6872)
  at android.app.activitythread.performlaunchactivity(activitythread.java:2666)
  at android.app.activitythread.handlelaunchactivity(activitythread.java:2764)
  at android.app.activitythread.-wrap12(activitythread.java:-1)
  at android.app.activitythread$h.handlemessage(activitythread.java:1510)
  at android.os.handler.dispatchmessage(handler.java:102)
  at android.os.looper.loop(looper.java:154)
  at android.app.activitythread.main(activitythread.java:6157)
  at java.lang.reflect.method.invoke!(native method)
  at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:926)
  at com.android.internal.os.zygoteinit.main(zygoteinit.java:788)

我们一般查看tracelog的时候,一般都是查看main线程的堆栈,但不并是意味着其他的线程就不据参考性,只是本例中无需查看,实际遇到的其他问题,我们还是要仔细查看tarce文件中显示的各个线程的堆栈,因为我们可以从trace文件中看出,anr发生时的某一时刻系统的状态,并且我们可以依据那一刻系统的状态,并结合adb log大胆的去想像anr发生时手机在做什么操作。本例可以从trace log中看出anr发生时,我的这个测试应用在调用at com.example.anrtestdemo.mainactivity.onstart(mainactivity.java:22)方法时存在耗时。搜索代码,查看是否存在耗时。

 @override
 protected void onstart() {
  super.onstart();
  log.d("abm", "onstart");
  long currenttimemillis = system.currenttimemillis();//记录系统当前时间
  while (true){//此处循环便是模拟的耗时操作,不断获取时间10s之后结束。
long timemillis = system.currenttimemillis();
if ((timemillis - currenttimemillis) >= 10000) {
 log.d("abm", "onstart return time is: " + (timemillis - currenttimemillis));
 return;
}
  }
 }

问题原因找到了,我们的mainactivity复写了mainactivity的onstart方法并且做了耗时操作,导致input事件分发超时。那么问题来了,为什么我这边想要给大家介绍无窗口,有应用类型的anr,为什么测试demo在onstart或者oncreate方法中做了耗时操作就能复现了呢该类型的anr问题?那是因为,android应用启动的时候都是先创建application,创建号application之后,才会执行应用启动的主activity,再执行该activity的oncreate,onstart,onresume方法,然后我们的应用窗口便是在onresume中才去向windowmanager添加注册的。因此在注册添加窗口之前,application或者启动的activity的生命周期oncreate,onstart的任意方法,做了耗时操作,或者他们加载一起的执行时间过长,都是能够导致无窗口,有应用类型的input anr问题发生的。这样一来,我们实际开发应用的时候,就要尽可能的把耗时的操作,异步处理。具体异步实现思路你可以使用new thread + handler,asynctask,handlerthread等等,这里推荐使用handlerthread,因为google封装的接口,使用起来简单。分析完本例演示的问题,突然发现依据adb log trace可以轻轻松松的定位问题的原因,但是实际遇到的问题,却并不如此的简单,要是遇到如下的trace文件,就比较尴尬了。

"main" prio=5 tid=1 native
  | group="main" scount=1 dscount=0 flags=1 obj=0x74374ee8 self=0x149d8c3a00
  | systid=8720 nice=0 cgrp=default sched=0/0 handle=0x15226569a8
  | state=s schedstat=( 0 0 0 ) utm=7673 stm=891 core=0 hz=100
  | stack=0x46803c3000-0x46803c5000 stacksize=8mb
  | held mutexes=
  kernel: (couldn't read /proc/self/task/8720/stack)
  native: #00 pc 000000000006a660  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001fca4  /system/lib64/libc.so (epoll_pwait+52)
  native: #02 pc 0000000000015d08  /system/lib64/libutils.so (android::looper::pollinner(int)+144)
  native: #03 pc 0000000000015bf0  /system/lib64/libutils.so (android::looper::pollonce(int, int*, int*, void**)+108)
  native: #04 pc 00000000001112e0  /system/lib64/libandroid_runtime.so (???)
  native: #05 pc 00000000001e166c  /system/framework/arm64/boot-framework.oat (java_android_os_messagequeue_nativepollonce__ji+140)
  at android.os.messagequeue.nativepollonce(native method)
  at android.os.messagequeue.next(messagequeue.java:325)
  at android.os.looper.loop(looper.java:142)
  at android.app.activitythread.main(activitythread.java:6558)
  at java.lang.reflect.method.invoke(native method)
  at com.android.internal.os.runtimeinit$methodandargscaller.run(runtimeinit.java:469)
  at com.android.internal.os.zygoteinit.main(zygoteinit.java:826)

你会发现trace文件都看得自己怀疑人生了,根据trace信息,咋一看,都要怀疑系统的looper messagequeue有问题,其实这是完全不可能的,要是系统looper都出现问题了,那整个系统都估计要挂了。但是这个从这个trace log中我们也是可以获取的一些有用信息的,如果我们熟悉系统looper messagequeue的机制的话,我们不难知道当应用主线程的looper messagequeue中没有消息的时候,我们会休眠于messagequeue.nativepollonce方法,实际是执行底层的looper的pollinner进入epoll_wait等待。当有消息来临的时候,此looper便会被wake,执行消息。因此我们可以大胆的猜测,anr问题发生时,应用主线程中无可执行的message,那么什么时候没有消息呢?一般执行完onresume完之后,并且界面没有更新,应用ui主线程的looper会进入messagequeue.nativepollonce。那么此时我们又应该如何继续分析呢?前面我们所说trace文件,我们一般看main thread的堆栈,那么此时是时候看一波应用其他的线程执行情况了。先搜索关键字“binder:8720_”(这个8720是当前anr进程的主进程号,系统一般都是按binder:进程号_,进行拼接binger线程名)至于原因,系统代码,就是这么定的,代码如下:

string8 processstate::makebinderthreadname() {
 int32_t s = android_atomic_add(1, &mthreadpoolseq);
 pid_t pid = getpid();
 string8 name;
 name.appendformat("binder:%d_%x", pid, s);
 return name;
}

搜索binder线程,看应用是否存在binder调用,然后更具trace,结合代码,猜想anr发生时,应用到底在做什么操作。实际遇到此问题中,说实在的我们并不能从trace中发现什么问题,此时需要我们多看一些adb log,去看看anr问题发生之前,系统都在干嘛,做了哪些操作,或者查出系统的运行状态等等的一些有用信息,我们要养成多看log的习惯,这样在分析anr问题的时候,才能更加准确的推演anr问题发生时的整个场景。

实战二:手动复现input事件超时,waitqueue不为空问题

屏幕按键:input dispatching timed out (waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. ?outbound queue length: 0. wait queue length: 1.

实体按键:input dispatching timed out (waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. ?wait queue length: 6. ?wait queue head age: 5507.1ms.
没什么说的,直接上log

05-28 15:16:36.753  1668  1702 i activitymanager: start proc 27398:com.example.anrtestdemo/u0a120 for activity com.example.anrtestdemo/.mainactivity
05-28 15:16:36.873  1668 11856 i am_proc_bound: [0,27398,com.example.anrtestdemo]
05-28 15:16:36.880  1668 11856 i am_restart_activity: [0,223351388,31,com.example.anrtestdemo/.mainactivity]
05-28 15:16:36.953 27398 27398 d abm  : onresume
05-28 15:16:36.955 27398 27398 i am_on_resume_called: [0,com.example.anrtestdemo.mainactivity,launch_activity]
05-28 15:16:37.090  1668  1720 i am_activity_launch_time: [0,223351388,com.example.anrtestdemo/.mainactivity,365,365]
05-28 15:16:37.090  1668  1720 i activitymanager: displayed com.example.anrtestdemo/.mainactivity: +365ms
05-28 15:16:45.279  1668  2070 i inputdispatcher: application is not responding: window{ea02cdb u0 com.example.anrtestdemo/com.example.anrtestdemo.mainactivity}.  it has been 5006.4ms since event, 5005.8ms since wait started.  reason: waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  wait queue length: 6.  wait queue head age: 5507.1ms.
05-28 15:16:45.343  1668  1713 i am_anr  : [0,27398,com.example.anrtestdemo,952680262,input dispatching timed out (waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  wait queue length: 6.  wait queue head age: 5507.1ms.)]
05-28 15:16:48.456  1668  1713 e activitymanager: anr in com.example.anrtestdemo (com.example.anrtestdemo/.mainactivity)
05-28 15:16:48.456  1668  1713 e activitymanager: pid: 27398
05-28 15:16:48.456  1668  1713 e activitymanager: reason: input dispatching timed out (waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  wait queue length: 6.  wait queue head age: 5507.1ms.)
05-28 15:16:48.456  1668  1713 e activitymanager: load: 6.11 / 5.38 / 4.77
05-28 15:16:48.456  1668  1713 e activitymanager: cpu usage from 36255ms to 0ms ago (2018-05-28 15:16:09.036 to 2018-05-28 15:16:45.291):
05-28 15:16:48.456  1668  1713 e activitymanager:10% 1668/system_server: 4.6% user + 6% kernel / faults: 5194 minor 7 major
05-28 15:16:48.456  1668  1713 e activitymanager:6.2% 558/surfaceflinger: 2.7% user + 3.4% kernel / faults: 480 minor
05-28 15:16:48.456  1668  1713 e activitymanager:3.2% 434/logd: 2.3% user + 0.9% kernel / faults: 3 minor
05-28 15:16:48.456  1668  1713 e activitymanager:3.1% 23788/adbd: 0.4% user + 2.6% kernel / faults: 561 minor
05-28 15:16:48.456  1668  1713 e activitymanager:2.7% 2490/com.android.systemui: 1.5% user + 1.1% kernel / faults: 588 minor
05-28 15:16:48.456  1668  1713 e activitymanager:1.2% 26217/kworker/u16:2: 0% user + 1.2% kernel
实战一中,已经详细介绍了如何看log,如何看trace文件了,这边就不再赘述,直接贴取关键log,

trace log

dalvik threads (13):
"main" prio=5 tid=1 runnable
  | group="main" scount=0 dscount=0 obj=0x757f91f8 self=0x4832e3fa00
  | systid=27398 nice=-10 cgrp=default sched=0/0 handle=0x483710aa98
  | state=r schedstat=( 6436596993 5748639 166 ) utm=581 stm=62 core=4 hz=100
  | stack=0x5c73865000-0x5c73867000 stacksize=8mb
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 0000000000479994  /system/lib64/libart.so (_zn3art15dumpnativestackernst3__113basic_ostreamicns0_11char_traitsiceeeeip12backtracemappkcpns_9artmethodepv+220)
  native: #01 pc 0000000000479990  /system/lib64/libart.so (_zn3art15dumpnativestackernst3__113basic_ostreamicns0_11char_traitsiceeeeip12backtracemappkcpns_9artmethodepv+216)
  native: #02 pc 000000000044dfa0  /system/lib64/libart.so (_znk3art6thread9dumpstackernst3__113basic_ostreamicns1_11char_traitsiceeeebp12backtracemap+472)
  native: #03 pc 00000000004658fc  /system/lib64/libart.so (_zn3art14dumpcheckpoint3runepns_6threade+820)
  native: #04 pc 000000000044eeac  /system/lib64/libart.so (_zn3art6thread21runcheckpointfunctionev+192)
  native: #05 pc 0000000000547928  /system/lib64/libart.so (_zn3art14jnimethodstartepns_6threade+168)
  native: #06 pc 00000000000b22f4  /system/framework/arm64/boot.oat (java_java_lang_system_currenttimemillis__+96)
  at java.lang.system.currenttimemillis(native method)
  at com.example.anrtestdemo.mainactivity$myhandler$1.run(mainactivity.java:70)
  at android.os.handler.handlecallback(handler.java:751)
  at android.os.handler.dispatchmessage(handler.java:95)
  at android.os.looper.loop(looper.java:154)
  at android.app.activitythread.main(activitythread.java:6157)
  at java.lang.reflect.method.invoke!(native method)
  at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:926)
  at com.android.internal.os.zygoteinit.main(zygoteinit.java:788)

定位问题代码

 @override
 protected void onresume() {
  super.onresume();
  log.d("abm", "onresume");
  final myhandler handler = new myhandler();
  new thread(new runnable() {
@override
public void run() {
 try {
  thread.sleep(2000);
  handler.obtainmessage(666).sendtotarget();
 } catch (interruptedexception e) {
  e.printstacktrace();
 }
}
  }).start();

 }

 class myhandler extends handler {
  @override
  public void handlemessage(message msg) {
super.handlemessage(msg);
if (msg.what == 666) {
 new handler().post(new runnable() {
  @override
  public void run() {
long currenttimemillis = system.currenttimemillis();//问题代码,这边模拟10s耗时操作
while (true){
 long timemillis = system.currenttimemillis();
 if ((timemillis - currenttimemillis) >= 10000) {
  log.d("abm", "onresume return time is: " + (timemillis - currenttimemillis));
  return;
 }
}
  }
 });
}
  }
 }

这边写的代码时,当应用界面加载好的时候,使用handler向ui主线程的looper messagequeue post一个runnable callback,这个runnable里面便是执行的耗时操作,因为这个耗时操作是在ui主线程中执行的,此时当我们再点击手机实体按键或者屏幕时,如果5s超时,则触发anr发生。当然实际操作中,我们并不会直接这么做如此显而易见的耗时操作,这里只是用类似这种简单粗暴的方式告诉大家,把耗时的操作移到ui线程之外,因为ui线程负责与用户交互界面更新,ui线程的卡吨直接影响用户体验,更会造成anr问题的发生。

通过上面的两个小例子,我们已经将最常见到的input超时导致的anr问题介绍了。由于篇幅原因,计划将input各种类型的anr问题分为两个部分来写,本篇先给大家介绍最为常见的两种类型的input超时anr,下篇再介绍剩下的一些类型比较少见的anr问题。