opcontrol 捕捉L2缓存IN事件
32K:L1D数据缓存
256K:L2缓存
15360K:L3缓存
64:指令缓存行
64:L1D数据缓存行
512:L2数据缓存行
12288:L3数据缓存行
# 设置捕捉L2缓存IN事件
$ sudo opcontrol --setup --event=l2_lines_in:100000
# 清空工作区
$ sudo opcontrol --reset
# 开始捕捉
$ sudo opcontrol --start
# 运行程序
$ java FalseSharing
# 程序跑完后, dump捕捉到的数据
$ sudo opcontrol --dump
# 停止捕捉
$ sudo opcontrol -h
# 报告结果
$ opreport -l `which java`
结果示例如下:
CPU: Intel Sandy Bridge microarchitecture, speed 2300.24 MHz (estimated)
Counted l2_lines_in events (L2 cache lines in) with a unit mask of 0x07 (all L2 cache lines filling L2) count 100000
samples % image name symbol name
14914 100.000 anon (tgid:9752 range:0x7fddb8424000-0x7fddb8694000) anon (tgid:9752 range:0x7fddb8424000-0x7fddb8694000)
结果中,samples的数值越高,说明l2_lines_in触发的次数越多。
本地的结果如下:
CPU: Intel Sandy Bridge microarchitecture, speed 2300.24 MHz (estimated)
均运行在4个cpu逻辑核心上。
名称 | 备注 | 时间(秒) | l1d次数 | l2_lines_in次数 | l2_trans次数 |
FlaseSharing | 有L1D竞争 | 259 | 1473 | 37439 | 156777 |
FalseSharing2 | 没有L1D竞争 | 40 | 501 | 31778 | 41114 |
AffinityFalseSharingDifferentSocket | 有L1D竞争,每个socket上有两个逻辑核心 | 217 | 1527 | 43370 | 101298 |
AffinityFalseSharingSameCore | 有L1D竞争,同一个socket上的2个core上4个逻辑核心 | 26 | 400 | 13673 | 22382 |
AffinityFalseSharingSameSocket | 有L1D竞争,同一个socket上的4个core上4个逻辑核心 | 44 | 912 | 35217 | 42213 |
第二组测试,限定使用2个逻辑核心
名称 | 备注 | 时间(秒) | l1d次数 | l2_lines_in次数 | l2_trans次数 | 预期结果 |
FlaseSharing | 有L1D竞争 | 77 | 472 | 18941 | 25000 | 近似 |
FalseSharing2 | 没有L1D竞争 | 51 | 271 | 6316 | 14925 | 近似 |
AffinityFalseSharingDifferentSocket | 有L1D竞争,2个socket上的2个逻辑核心 | 54 | 429 | 16803 | 28851 | |
AffinityFalseSharing2DifferentSocket | 没有L1D竞争,2个socket上的2个逻辑核心 | 30 | — | — | — | |
AffinityFalseSharingSameCore | 有L1D竞争,1个socket上的2个core上2个逻辑核心 | 21 | 284 | 10528 | 14900 | 待确认 |
AffinityFalseSharing2SameCore | 没有L1D竞争,1个socket上的2个core上2个逻辑核心 | 31 | 725 | 15930 | 30264 | 待确认 |
AffinityFalseSharingSameSocket | 有L1D竞争,1个socket上的2个core上2个逻辑核心 | 35 | 661 | 16019 | 27445 | |
AffinityFalseSharing2SameSocket | 没有L1D竞争,1个socket上的2个core上2个逻辑核心 | 20 | 322 | 11877 | 16513 |
l1d: (counter: all)
L1D cache events (min count: 2000000)
Unit masks (default 0x1)
----------
0x01: replacement L1D Data line replacements.
0x02: allocated_in_m L1D M-state Data Cache Lines Allocated
0x04: eviction L1D M-state Data Cache Lines Evicted due to replacement (only)
0x08: all_m_replacement All Modified lines evicted out of L1D
l2_l1d_wb_rqsts: (counter: all)
writebacks from L1D to the L2 cache (min count: 200000)
Unit masks (default 0x4)
----------
0x04: hit_e writebacks from L1D to L2 cache lines in E state
0x08: hit_m writebacks from L1D to L2 cache lines in M state
l1d_pend_miss: (counter: 2)
Cycles with L1D load Misses outstanding. (min count: 2000000)
Unit masks (default 0x1)
----------
0x01: pending Cycles with L1D load Misses outstanding.
0x01: occurences This event counts the number of L1D misses outstanding occurences.
(extra: edge cmask=1)
l1d_blocks: (counter: all)
L1D cache blocking events (min count: 100000)
Unit masks (default 0x1)
----------
0x01: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict
0x05: bank_conflict_cycles Cycles with l1d blocks due to bank conflicts (extra: cmask=1)
l2_trans: (counter: all)
L2 cache accesses (min count: 200000)
Unit masks (default 0x80)
----------
0x80: all_requests Transactions accessing L2 pipe
0x01: demand_data_rd Demand Data Read requests that access L2 cache, includes L1D
prefetches.
0x02: rfo RFO requests that access L2 cache
0x04: code_rd L2 cache accesses when fetching instructions including L1D code prefetches
0x08: all_pf L2 or LLC HW prefetches that access L2 cache
0x10: l1d_wb L1D writebacks that access L2 cache
0x20: l2_fill L2 fill requests that access L2 cache
0x40: l2_wb L2 writebacks that access L2 cache
l2_lines_in: (counter: all)
L2 cache lines in (min count: 100000)
Unit masks (default 0x7)
----------
0x07: all L2 cache lines filling L2
0x01: i L2 cache lines in I state filling L2
0x02: s L2 cache lines in S state filling L2
0x04: e L2 cache lines in E state filling L2
l2_lines_out: (counter: all)
L2 cache lines out (min count: 100000)
Unit masks (default 0x1)
----------
0x01: demand_clean Clean line evicted by a demand
0x02: demand_dirty Dirty line evicted by a demand
0x04: pf_clean Clean line evicted by an L2 Prefetch
0x08: pf_dirty Dirty line evicted by an L2 Prefetch
0x0a: dirty_all Any Dirty line evicted
附二:
# 查看缓存大小
$ls /sys/devices/system/cpu/cpu0/cache/
index0 index1 index2 index3
4个目录
index0:1级数据cache
index1:1级指令cache
index2:2级cache
index3:3级cache ,对应cpuinfo里的cache
目录里的文件是cache信息描述,以本机的cpu0/index0为例简单解释一下:
文件 | 内容 | 说明 |
type | Data | 数据cache,如果查看index1就是Instruction |
Level | 1 | L1 |
Size | 32K | 大小为32K |
coherency_line_size | 64 | 64*4*128=32K|
physical_line_partition | 1 | |
ways_of_associativity | 4 | |
number_of_sets | 128 | |
shared_cpu_map | 00000101 | 表示这个cache被CPU0和CPU8 share |
解释一下shared_cpu_map内容的格式:
表面上看是2进制,其实是16进制表示,每个bit表示一个cpu,1个数字可以表示4个cpu
截取00000101的后4位,转换为2进制表示
CPU id | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0×0101的2进制表示 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
0101表示cpu8和cpu0,即cpu0的L1 data cache是和cpu8共享的。
验证一下?
cat /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_map
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000101
再看一下index3 shared_cpu_map的例子
cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_map
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000f0f
CPU id | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0x0f0f的2进制表示 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
cpu0,1,2,3和cpu8,9,10,11共享L3 cache
参考:
1. http://itindex.net/detail/37419-java-视角-理解
2. http://www.searchtb.com/2012/12/玩转cpu-topology.html