欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

(两百八十二) 学习Simpleperf

程序员文章站 2022-04-20 20:05:30
...

官方文档

https://developer.android.google.cn/ndk/guides/simpleperf

简要说明

Simpleperf 是一个通用的命令行 CPU 性能剖析工具,包含在面向 Mac、Linux 和 Windows 的 NDK 中。

 

使用说明

raphael:/ # simpleperf                                                                                                                                                                                     
Usage: simpleperf [common options] subcommand [args_for_subcommand]
common options:
    -h/--help     Print this help information.
    --log <severity> Set the minimum severity of logging. Possible severities
                     include verbose, debug, warning, info, error, fatal.
                     Default is info.
    --log-to-android-buffer  Write log to android log buffer instead of stderr.
    --version     Print version of simpleperf.
subcommands:
    api-collect         Collect recording data generated by app api
    api-prepare         Prepare recording via app api
    debug-unwind        Debug/test offline unwinding.
    dump                dump perf record file
    help                print help information for simpleperf
    kmem                collect kernel memory allocation information
    list                list available event types
    record              record sampling info in perf.data
    report              report sampling information in perf.data
    report-sample       report raw sample information in perf.data
    stat                gather performance counter information
    trace-sched         Trace system-wide process runtime events.


 

执行

一开始执行报错

raphael:/ # simpleperf report --sort dso                                                                                                                                                                   
simpleperf E record_file_reader.cpp:76] failed to open record file 'perf.data': No such file or directory
 

后来参考https://blog.csdn.net/zsl_oo7/article/details/72799628

加上-o /sdcard/perf.data重定向

实现WiFi的进程

raphael:/ # ps -A | grep wifi                                                                                                                                                                              
wifi           788     1 2186504  16356 binder_ioctl_write_read 0 S [email protected]
wifi          1201     1   38752   5140 SyS_epoll_wait      0 S wificond
system        1207     1   20992   6816 binder_ioctl_write_read 0 S wifidisplayhalservice
wifi         23135     1 2237784  10460 do_select           0 S wpa_supplicant
raphael:/ # simpleperf record -p 788 --duration 30 -o /sdcard/perf.data 
simpleperf I cmd_record.cpp:635] Samples recorded: 368. Samples lost: 0.
raphael:/ # simpleperf report -i /sdcard/perf.data -n --sort dso
Cmdline: /system/bin/simpleperf record -p 788 --duration 30 -o /sdcard/perf.data
Arch: arm64
Event: cpu-cycles (type 0, config 0)
Samples: 368
Event count: 39321745

Overhead  Sample  Shared Object
36.01%    157     [kernel.kallsyms]
26.45%    89      /apex/com.android.runtime/lib64/bionic/libc.so
25.92%    85      /vendor/lib64/libwifi-hal.so
7.21%     22      /system/lib64/vndk-29/libnl.so
4.08%     14      [vdso]
0.33%     1       /vendor/bin/hw/[email protected]

然后我们看下libwifi-hal的so中哪个方法占据CPU最长

130|raphael:/ # simpleperf report -i /sdcard/perf.data --dsos /vendor/lib64/libwifi-hal.so --sort symbol
Cmdline: /system/bin/simpleperf record -p 788 --duration 30 -o /sdcard/perf.data
Arch: arm64
Event: cpu-cycles (type 0, config 0)
Samples: 85
Event count: 10193523

Overhead  Symbol
48.79%    rb_write(void*, unsigned char*, unsigned long, int, unsigned long) (.cfi)
11.90%    @plt
10.26%    process_firmware_prints(hal_info_s*, unsigned char*, unsigned short) (.cfi)
10.10%    diag_message_handler(hal_info_s*, nl_msg*) (.cfi)
8.73%     ring_buffer_write(rb_info*, unsigned char*, unsigned long, int, unsigned long) (.cfi)
4.11%     wifi_event_loop.cfi
2.90%     user_sock_message_handler(nl_msg*, void*) (.cfi)
1.23%     no_seq_check(nl_msg*, void*)
1.15%     ring_buffer_write(rb_info*, unsigned char*, unsigned long, int, unsigned long)
0.84%     diag_message_handler(hal_info_s*, nl_msg*)

理解了下record就是记录下当前CPU使用的相关信息,report就是将信息以文本形式解析出来,上面的是按so排序的,官方文档中还有以pid排序的

看下record的使用说明

130|raphael:/ # simpleperf help record
Usage: simpleperf record [options] [--] [command [command-args]]
       Gather sampling information of running [command]. And -a/-p/-t option
       can be used to change target of sampling information.
       The default options are: -e cpu-cycles -f 4000 -o perf.data.
Select monitored threads:
-a     System-wide collection.
--app package_name    Profile the process of an Android application.
                      On non-rooted devices, the app must be debuggable,
                      because we use run-as to switch to the app's context.
-p pid1,pid2,...       Record events on existing processes. Mutually exclusive
                       with -a.
-t tid1,tid2,... Record events on existing threads. Mutually exclusive with -a.

Select monitored event types:
-e event1[:modifier1],event2[:modifier2],...
             Select a list of events to record. An event can be:
               1) an event name listed in `simpleperf list`;
               2) a raw PMU event in rN format. N is a hex number.
                  For example, r1b selects event number 0x1b.
             Modifiers can be added to define how the event should be
             monitored. Possible modifiers are:
                u - monitor user space events only
                k - monitor kernel space events only
--group event1[:modifier],event2[:modifier2],...
             Similar to -e option. But events specified in the same --group
             option are monitored as a group, and scheduled in and out at the
             same time.
--trace-offcpu   Generate samples when threads are scheduled off cpu.
                 Similar to "-c 1 -e sched:sched_switch".

Select monitoring options:
-f freq      Set event sample frequency. It means recording at most [freq]
             samples every second. For non-tracepoint events, the default
             option is -f 4000. A -f/-c option affects all event types
             following it until meeting another -f/-c option. For example,
             for "-f 1000 cpu-cycles -c 1 -e sched:sched_switch", cpu-cycles
             has sample freq 1000, sched:sched_switch event has sample period 1.
-c count     Set event sample period. It means recording one sample when
             [count] events happen. For tracepoint events, the default option
             is -c 1.
--call-graph fp | dwarf[,<dump_stack_size>]
             Enable call graph recording. Use frame pointer or dwarf debug
             frame as the method to parse call graph in stack.
             Default is dwarf,65528.
-g           Same as '--call-graph dwarf'.
--clockid clock_id      Generate timestamps of samples using selected clock.
                        Possible values are: realtime, monotonic,
                        monotonic_raw, boottime, perf. If supported, default
                        is monotonic, otherwise is perf.
--cpu cpu_item1,cpu_item2,...
             Collect samples only on the selected cpus. cpu_item can be cpu
             number like 1, or cpu range like 0-3.
--duration time_in_sec  Monitor for time_in_sec seconds instead of running
                        [command]. Here time_in_sec may be any positive
                        floating point number.
-j branch_filter1,branch_filter2,...
             Enable taken branch stack sampling. Each sample captures a series
             of consecutive taken branches.
             The following filters are defined:
                any: any type of branch
                any_call: any function call or system call
                any_ret: any function return or system call return
                ind_call: any indirect branch
                u: only when the branch target is at the user level
                k: only when the branch target is in the kernel
             This option requires at least one branch type among any, any_call,
             any_ret, ind_call.
-b           Enable taken branch stack sampling. Same as '-j any'.
-m mmap_pages   Set the size of the buffer used to receiving sample data from
                the kernel. It should be a power of 2. If not set, the max
                possible value <= 1024 will be used.
--no-inherit  Don't record created child threads/processes.
--cpu-percent <percent>  Set the max percent of cpu time used for recording.
                         percent is in range [1-100], default is 25.

Dwarf unwinding options:
--post-unwind=(yes|no) If `--call-graph dwarf` option is used, then the user's
                       stack will be recorded in perf.data and unwound while
                       recording by default. Use --post-unwind=yes to switch
                       to unwind after recording.
--no-unwind   If `--call-graph dwarf` option is used, then the user's stack
              will be unwound by default. Use this option to disable the
              unwinding of the user's stack.
--no-callchain-joiner  If `--call-graph dwarf` option is used, then by default
                       callchain joiner is used to break the 64k stack limit
                       and build more complete call graphs. However, the built
                       call graphs may not be correct in all cases.
--callchain-joiner-min-matching-nodes count
               When callchain joiner is used, set the matched nodes needed to join
               callchains. The count should be >= 1. By default it is 1.

Recording file options:
--no-dump-kernel-symbols  Don't dump kernel symbols in perf.data. By default
                          kernel symbols will be dumped when needed.
--no-dump-symbols       Don't dump symbols in perf.data. By default symbols are
                        dumped in perf.data, to support reporting in another
                        environment.
-o record_file_name    Set record file name, default is perf.data.
--size-limit SIZE[K|M|G]      Stop recording after SIZE bytes of records.
                              Default is unlimited.
--symfs <dir>    Look for files with symbols relative to this directory.
                 This option is used to provide files with symbol table and
                 debug information, which are used for unwinding and dumping symbols.

Other options:
--exit-with-parent            Stop recording when the process starting
                              simpleperf dies.
--start_profiling_fd fd_no    After starting profiling, write "STARTED" to
                              <fd_no>, then close <fd_no>.
--stdio-controls-profiling    Use stdin/stdout to pause/resume profiling.
--in-app                      We are already running in the app's context.
--tracepoint-events file_name   Read tracepoint events from [file_name] instead of tracefs.

 

绘制图表

1|raphael:/ # simpleperf help report
Usage: simpleperf report [options]
The default options are: -i perf.data --sort comm,pid,tid,dso,symbol.
-b    Use the branch-to addresses in sampled take branches instead of the
      instruction addresses. Only valid for perf.data recorded with -b/-j
      option.
--children    Print the overhead accumulated by appearing in the callchain.
--comms comm1,comm2,...   Report only for selected comms.
--dsos dso1,dso2,...      Report only for selected dsos.
--full-callgraph  Print full call graph. Used with -g option. By default,
                  brief call graph is printed.
-g [callee|caller]    Print call graph. If callee mode is used, the graph
                      shows how functions are called from others. Otherwise,
                      the graph shows how functions call others.
                      Default is caller mode.
-i <file>  Specify path of record file, default is perf.data.
--kallsyms <file>     Set the file to read kernel symbols.
--max-stack <frames>  Set max stack frames shown when printing call graph.
-n         Print the sample count for each item.
--no-demangle         Don't demangle symbol names.
--no-show-ip          Don't show vaddr in file for unknown symbols.
-o report_file_name   Set report file name, default is stdout.
--percent-limit <percent>  Set min percentage shown when printing call graph.
--pids pid1,pid2,...  Report only for selected pids.
--raw-period          Report period count instead of period percentage.
--sort key1,key2,...  Select keys used to sort and print the report. The
                      appearance order of keys decides the order of keys used
                      to sort and print the report.
                      Possible keys include:
                        pid             -- process id
                        tid             -- thread id
                        comm            -- thread name (can be changed during
                                           the lifetime of a thread)
                        dso             -- shared library
                        symbol          -- function name in the shared library
                        vaddr_in_file   -- virtual address in the shared
                                           library
                      Keys can only be used with -b option:
                        dso_from        -- shared library branched from
                        dso_to          -- shared library branched to
                        symbol_from     -- name of function branched from
                        symbol_to       -- name of function branched to
                      The default sort keys are:
                        comm,pid,tid,dso,symbol
--symbols symbol1;symbol2;...    Report only for selected symbols.
--symfs <dir>         Look for files with symbols relative to this directory.
--tids tid1,tid2,...  Report only for selected tids.
--vmlinux <file>      Parse kernel symbols from <file>.

raphael:/ # simpleperf report -g -i /sdcard/perf.data                                                                                                                                                      
Cmdline: /system/bin/simpleperf record -p 788 --duration 30 -o /sdcard/perf.data
Arch: arm64
Event: cpu-cycles (type 0, config 0)
Samples: 368
Event count: 39321745

Children  Self    Command          Pid  Tid    Shared Object                                     Symbol
12.65%    12.65%  [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      rb_write(void*, unsigned char*, unsigned long, int, unsigned long) (.cfi)
6.90%     6.90%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    pthread_mutex_unlock
5.81%     5.81%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    pthread_mutex_lock
4.08%     4.08%   [email protected]  788  23141  [vdso]                                            __kernel_gettimeofday
4.01%     4.01%   [email protected]  788  23141  [kernel.kallsyms]                                 el0_sys
3.74%     3.74%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    __memcpy
3.09%     3.09%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      @plt
2.82%     2.82%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    je_free
2.66%     2.66%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      process_firmware_prints(hal_info_s*, unsigned char*, unsigned short) (.cfi)
2.62%     2.62%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      diag_message_handler(hal_info_s*, nl_msg*) (.cfi)
2.26%     2.26%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      ring_buffer_write(rb_info*, unsigned char*, unsigned long, int, unsigned long) (.cfi)
2.18%     2.18%   [email protected]  788  23141  [kernel.kallsyms]                                 el0_svc_naked
1.95%     1.95%   [email protected]  788  23141  [kernel.kallsyms]                                 cntvct_read_handler
1.92%     1.92%   [email protected]  788  23141  [kernel.kallsyms]                                 do_notify_resume
1.89%     1.89%   [email protected]  788  23141  [kernel.kallsyms]                                 do_sys_poll
1.78%     1.78%   [email protected]  788  23141  [kernel.kallsyms]                                 sock_poll
1.75%     1.75%   [email protected]  788  23141  [kernel.kallsyms]                                 do_sysinstr
1.71%     1.71%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    @plt
1.44%     1.44%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    je_malloc
1.40%     1.40%   [email protected]  788  23141  [kernel.kallsyms]                                 __fget_light
1.40%     1.40%   [email protected]  788  23141  [kernel.kallsyms]                                 datagram_poll
1.34%     1.34%   [email protected]  788  23141  [kernel.kallsyms]                                 _raw_spin_unlock_irqrestore
1.29%     1.29%   [email protected]  788  23141  [kernel.kallsyms]                                 __arch_copy_to_user
1.12%     1.12%   [email protected]vic  788  23141  /system/lib64/vndk-29/libnl.so                    nlmsg_next
1.06%     1.06%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      wifi_event_loop.cfi
0.99%     0.99%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    nl_recvmsgs_report
0.93%     0.93%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    je_calloc
0.88%     0.88%   [email protected]  788  23141  [kernel.kallsyms]                                 __check_object_size
0.81%     0.81%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    memset
0.80%     0.80%   [email protected]  788  23141  [kernel.kallsyms]                                 skb_copy_datagram_iter
0.76%     0.76%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    genlmsg_attrdata
0.75%     0.75%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      user_sock_message_handler(nl_msg*, void*) (.cfi)
0.74%     0.74%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    @plt
0.65%     0.65%   [email protected]  788  23141  [kernel.kallsyms]                                 __softirqentry_text_start
0.62%     0.62%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    nl_recv
0.61%     0.61%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    nl_socket_get_cb
0.60%     0.60%   [email protected]  788  23141  [kernel.kallsyms]                                 sys_ppoll
0.59%     0.59%   [email protected]  788  23141  [kernel.kallsyms]                                 fpsimd_load_state
0.59%     0.59%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    gettimeofday
0.56%     0.56%   [email protected]  788  23141  [kernel.kallsyms]                                 __wake_up_common_lock
0.54%     0.54%   [email protected]  788  23141  [kernel.kallsyms]                                 schedule_hrtimeout_range_clock
0.51%     0.51%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    nlmsg_data
0.49%     0.49%   [email protected]  788  23141  [kernel.kallsyms]                                 security_socket_recvmsg
0.48%     0.48%   [email protected]  788  23141  [kernel.kallsyms]                                 sys_recvmsg
0.46%     0.46%   [email protected]  788  23141  [kernel.kallsyms]                                 pte_map_lock
0.46%     0.46%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    __ppoll
0.46%     0.46%   [email protected]  788  23141  [kernel.kallsyms]                                 move_addr_to_user
0.45%     0.45%   [email protected]  788  23141  [kernel.kallsyms]                                 memblock_is_map_memory
0.45%     0.45%   [email protected]  788  23141  [kernel.kallsyms]                                 avc_lookup
0.43%     0.43%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    free
0.42%     0.42%   [email protected]  788  23141  [kernel.kallsyms]                                 fput
0.41%     0.41%   [email protected]  788  23141  [kernel.kallsyms]                                 poll_freewait
0.39%     0.39%   [email protected]  788  23141  [kernel.kallsyms]                                 htc_rx_completion_handler
0.37%     0.37%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    extent_deregister_impl
0.37%     0.37%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    pthread_getspecific
0.37%     0.37%   [email protected]  788  23141  [kernel.kallsyms]                                 unmap_page_range
0.34%     0.34%   [email protected]  788  23141  [kernel.kallsyms]                                 skb_recv_datagram
0.34%     0.34%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    nlmsg_set_src
0.33%     0.33%   [email protected]  788  23141  [kernel.kallsyms]                                 tasklet_hi_action
0.33%     0.33%   [email protected]  788  23141  [kernel.kallsyms]                                 unix_dgram_poll
0.33%     0.33%   [email protected]  788  23141  /vendor/bin/hw/[email protected]  android::hardware::wifi::V1_3::implementation::Ringbuffer::append(std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > const&)
0.33%     0.33%   [email protected]  788  23141  [kernel.kallsyms]                                 do_mem_abort
0.32%     0.32%   [email protected]  788  23141  [kernel.kallsyms]                                 get_page_from_freelist
0.32%     0.32%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      no_seq_check(nl_msg*, void*)
0.31%     0.31%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    calloc
0.31%     0.31%   [email protected]  788  23141  [kernel.kallsyms]                                 __skb_try_recv_datagram
0.31%     0.31%   [email protected]  788  23141  [kernel.kallsyms]                                 __rcu_read_unlock
0.31%     0.31%   [email protected]  788  23141  [kernel.kallsyms]                                 skb_free_datagram
0.30%     0.30%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      ring_buffer_write(rb_info*, unsigned char*, unsigned long, int, unsigned long)
0.30%     0.30%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    nl_cb_put
0.29%     0.29%   [email protected]  788  23141  [kernel.kallsyms]                                 selinux_socket_recvmsg
0.28%     0.28%   [email protected]  788  23141  [kernel.kallsyms]                                 ce_per_engine_service
0.28%     0.28%   [email protected]  788  23141  [kernel.kallsyms]                                 put_pid
0.28%     0.28%   [email protected]  788  23141  [kernel.kallsyms]                                 free_hot_cold_page
0.28%     0.28%   [email protected]  788  23141  [kernel.kallsyms]                                 lru_add_drain_cpu
0.28%     0.28%   [email protected]  788  23141  [kernel.kallsyms]                                 vmacache_find
0.27%     0.27%   [email protected]  788  23141  [kernel.kallsyms]                                 __check_heap_object
0.27%     0.27%   [email protected]  788  23141  [kernel.kallsyms]                                 queue_work_on
0.26%     0.26%   [email protected]  788  23141  [kernel.kallsyms]                                 ___sys_recvmsg
0.26%     0.26%   [email protected]  788  23141  [kernel.kallsyms]                                 tcs_notify_tx_done
0.25%     0.25%   [email protected]  788  23141  [kernel.kallsyms]                                 netlink_recvmsg
0.25%     0.25%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    poll
0.24%     0.24%   [email protected]  788  23141  /system/lib64/vndk-29/libnl.so                    genlmsg_attrlen
0.24%     0.24%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    je_extent_avail_first
0.23%     0.23%   [email protected]  788  23141  /apex/com.android.runtime/lib64/bionic/libc.so    malloc
0.22%     0.22%   [email protected]  788  23141  /vendor/lib64/libwifi-hal.so                      diag_message_handler(hal_info_s*, nl_msg*)
0.21%     0.21%   [email protected]  788  23141  [kernel.kallsyms]                                 copy_msghdr_from_user
0.20%     0.20%   [email protected]  788  23141  [kernel.kallsyms]                                 avtab_search_node
0.05%     0.05%   [email protected]  788  23141  [kernel.kallsyms]                                 rw_copy_check_uvector
0.02%     0.02%   [email protected]  788  23141  [kernel.kallsyms]                                 _raw_spin_unlock_irq
0.01%     0.01%   [email protected]  788  23141  [kernel.kallsyms]                                 finish_task_switch
0.01%     0.01%   [email protected]  788  23141  [kernel.kallsyms]                                 __schedule

 

相关标签: Performance