opcontrol 捕捉L2缓存IN事件

程序员文章站 2024-03-05 22:30:31

...

# 查看缓存大小

$cat /sys/devices/system/cpu/cpu0/cache/index*/size

32K：指令缓存
32K：L1D数据缓存
256K：L2缓存
15360K：L3缓存

# 查看一条缓存行大小

$cat /sys/devices/system/cpu/cpu0/cache/index*/number_of_sets
64：指令缓存行
64：L1D数据缓存行
512：L2数据缓存行
12288：L3数据缓存行

# 设置捕捉L2缓存IN事件
$ sudo opcontrol --setup --event=l2_lines_in:100000

# 清空工作区
$ sudo opcontrol --reset

# 开始捕捉
$ sudo opcontrol --start

# 运行程序
$ java FalseSharing

# 程序跑完后, dump捕捉到的数据
$ sudo opcontrol --dump

# 停止捕捉
$ sudo opcontrol -h

# 报告结果
$ opreport -l `which java`

结果示例如下：

CPU: Intel Sandy Bridge microarchitecture, speed 2300.24 MHz (estimated)
Counted l2_lines_in events (L2 cache lines in) with a unit mask of 0x07 (all L2 cache lines filling L2) count 100000
samples % image name symbol name
14914 100.000 anon (tgid:9752 range:0x7fddb8424000-0x7fddb8694000) anon (tgid:9752 range:0x7fddb8424000-0x7fddb8694000)

结果中，samples的数值越高，说明l2_lines_in触发的次数越多。

本地的结果如下：

CPU: Intel Sandy Bridge microarchitecture, speed 2300.24 MHz (estimated)

均运行在4个cpu逻辑核心上。

名称	备注	时间（秒）	l1d次数	l2_lines_in次数	l2_trans次数
FlaseSharing	有L1D竞争	259	1473	37439	156777
FalseSharing2	没有L1D竞争	40	501	31778	41114
AffinityFalseSharingDifferentSocket	有L1D竞争，每个socket上有两个逻辑核心	217	1527	43370	101298
AffinityFalseSharingSameCore	有L1D竞争，同一个socket上的2个core上4个逻辑核心	26	400	13673	22382
AffinityFalseSharingSameSocket	有L1D竞争，同一个socket上的4个core上4个逻辑核心	44	912	35217	42213

第二组测试，限定使用2个逻辑核心

名称	备注	时间（秒）	l1d次数	l2_lines_in次数	l2_trans次数	预期结果
FlaseSharing	有L1D竞争	77	472	18941	25000	近似
FalseSharing2	没有L1D竞争	51	271	6316	14925	近似
AffinityFalseSharingDifferentSocket	有L1D竞争，2个socket上的2个逻辑核心	54	429	16803	28851
AffinityFalseSharing2DifferentSocket	没有L1D竞争，2个socket上的2个逻辑核心	30	—	—	—
AffinityFalseSharingSameCore	有L1D竞争，1个socket上的2个core上2个逻辑核心	21	284	10528	14900	待确认
AffinityFalseSharing2SameCore	没有L1D竞争，1个socket上的2个core上2个逻辑核心	31	725	15930	30264	待确认
AffinityFalseSharingSameSocket	有L1D竞争，1个socket上的2个core上2个逻辑核心	35	661	16019	27445
AffinityFalseSharing2SameSocket	没有L1D竞争，1个socket上的2个core上2个逻辑核心	20	322	11877	16513

l1d: (counter: all)
        L1D cache events (min count: 2000000)
        Unit masks (default 0x1)
        ----------
        0x01: replacement L1D Data line replacements.
        0x02: allocated_in_m L1D M-state Data Cache Lines Allocated
        0x04: eviction L1D M-state Data Cache Lines Evicted due to replacement (only)
        0x08: all_m_replacement All Modified lines evicted out of L1D

l2_l1d_wb_rqsts: (counter: all)
        writebacks from L1D to the L2 cache (min count: 200000)
        Unit masks (default 0x4)
        ----------
        0x04: hit_e writebacks from L1D to L2 cache lines in E state
        0x08: hit_m writebacks from L1D to L2 cache lines in M state

l1d_pend_miss: (counter: 2)
        Cycles with L1D load Misses outstanding. (min count: 2000000)
        Unit masks (default 0x1)
        ----------
        0x01: pending Cycles with L1D load Misses outstanding.
        0x01: occurences This event counts the number of L1D misses outstanding occurences.
              (extra: edge cmask=1)

l1d_blocks: (counter: all)
        L1D cache blocking events (min count: 100000)
        Unit masks (default 0x1)
        ----------
        0x01: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict
        0x05: bank_conflict_cycles Cycles with l1d blocks due to bank conflicts (extra: cmask=1)

l2_trans: (counter: all)
        L2 cache accesses (min count: 200000)
        Unit masks (default 0x80)
        ----------
        0x80: all_requests Transactions accessing L2 pipe
        0x01: demand_data_rd Demand Data Read requests that access L2 cache, includes L1D
              prefetches.
        0x02: rfo RFO requests that access L2 cache
        0x04: code_rd L2 cache accesses when fetching instructions including L1D code prefetches
        0x08: all_pf L2 or LLC HW prefetches that access L2 cache
        0x10: l1d_wb L1D writebacks that access L2 cache
        0x20: l2_fill L2 fill requests that access L2 cache
        0x40: l2_wb L2 writebacks that access L2 cache

l2_lines_in: (counter: all)
        L2 cache lines in (min count: 100000)
        Unit masks (default 0x7)
        ----------
        0x07: all L2 cache lines filling L2
        0x01: i L2 cache lines in I state filling L2
        0x02: s L2 cache lines in S state filling L2
        0x04: e L2 cache lines in E state filling L2

l2_lines_out: (counter: all)
        L2 cache lines out (min count: 100000)
        Unit masks (default 0x1)
        ----------
        0x01: demand_clean Clean line evicted by a demand
        0x02: demand_dirty Dirty line evicted by a demand
        0x04: pf_clean Clean line evicted by an L2 Prefetch
        0x08: pf_dirty Dirty line evicted by an L2 Prefetch
        0x0a: dirty_all Any Dirty line evicted

附二：

# 查看缓存大小

$ls /sys/devices/system/cpu/cpu0/cache/

index0 index1 index2 index3

4个目录
index0:1级数据cache
index1:1级指令cache
index2:2级cache
index3:3级cache ,对应cpuinfo里的cache

目录里的文件是cache信息描述，以本机的cpu0/index0为例简单解释一下：

64*4*128=32K

文件	内容	说明
type	Data	数据cache，如果查看index1就是Instruction
Level	1	L1
Size	32K	大小为32K
coherency_line_size	64
physical_line_partition	1
ways_of_associativity	4
number_of_sets	128
shared_cpu_map	00000101	表示这个cache被CPU0和CPU8 share

解释一下shared_cpu_map内容的格式：
表面上看是2进制，其实是16进制表示，每个bit表示一个cpu，1个数字可以表示4个cpu
截取00000101的后4位，转换为2进制表示

CPU id	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0×0101的2进制表示	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	1

0101表示cpu8和cpu0，即cpu0的L1 data cache是和cpu8共享的。

验证一下？
cat /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_map
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101

再看一下index3 shared_cpu_map的例子
cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_map
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000f0f

CPU id	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0x0f0f的2进制表示	0	0	0	0	1	1	1	1	0	0	0	0	1	1	1	1

cpu0,1,2,3和cpu8,9,10,11共享L3 cache

参考：
1. http://itindex.net/detail/37419-java-视角-理解

2. http://www.searchtb.com/2012/12/玩转cpu-topology.html

上一篇： java发送邮件附件乱码的解决博客分类： WEB 附件名称乱码

下一篇： Android Camera开发手电筒功能