一次EVENT 10200寻找Buffer Get之旅,最终以狗血案例而告终
很多案件的最终结局其实都是很狗血的,这不我也深深的经历了一遭。整个过程感觉就像小时候大家伙玩传奇一样,从迷宫一层一层的下,下到最底层之后发现,妹的,居然不是大Boss,居然是个小怪。浪费了我大量的精力。首先介绍一下背景,11g一条语句比10g一条语句
很多案件的最终结局其实都是很狗血的,这不我也深深的经历了一遭。整个过程感觉就像小时候大家伙玩>一样,从迷宫一层一层的下,下到最底层之后发现,妹的,居然不是大Boss,居然是个小怪。浪费了我大量的精力。首先介绍一下背景,11g一条语句比10g一条语句的cost下降了很多,但是buffer get确增加了很多。相关表的数据量其实没多少差异。所以当时我就在思考,这多出来的buffer get是那些块,涉及的对象是那些。
================================》10g执行计划 Execution Plan ---------------------------------------------------------- Plan hash value: 2721392943 ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 225 (1)| 00:00:03 | | | | 1 | SORT AGGREGATE | | 1 | | | | | | | 2 | VIEW | | 21 | | 225 (1)| 00:00:03 | | | | 3 | HASH GROUP BY | | 21 | 1701 | 225 (1)| 00:00:03 | | | | 4 | VIEW | | 21 | 1701 | 224 (1)| 00:00:03 | | | | 5 | UNION-ALL | | | | | | | | | 6 | NESTED LOOPS OUTER | | 20 | 860 | 191 (1)| 00:00:03 | | | | 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| UR_USER_INFO | 6 | 186 | 8 (0)| 00:00:01 | ROWID | ROWID | |* 8 | INDEX RANGE SCAN | IDX_USERINFO_IX | 6 | | 3 (0)| 00:00:01 | | | | 9 | PARTITION RANGE ALL | | 3 | 36 | 30 (0)| 00:00:01 | 1 | 18 | | 10 | TABLE ACCESS BY LOCAL INDEX ROWID| OR_ORDERLINE_INFO | 3 | 36 | 30 (0)| 00:00:01 | 1 | 18 | |* 11 | INDEX RANGE SCAN | IDX_OR_ORDERLINE_INFO_IN | 3 | | 29 (0)| 00:00:01 | 1 | 18 | | 12 | NESTED LOOPS OUTER | | 1 | 43 | 33 (0)| 00:00:01 | | | | 13 | PARTITION RANGE ALL | | 1 | 34 | 32 (0)| 00:00:01 | 1 | 18 | |* 14 | TABLE ACCESS BY LOCAL INDEX ROWID| OR_ORDERLINE_INFO | 1 | 34 | 32 (0)| 00:00:01 | 1 | 18 | |* 15 | INDEX RANGE SCAN | IDX_OR_ORDERLINE_INFO_CI | 6 | | 30 (0)| 00:00:01 | 1 | 18 | | 16 | PARTITION RANGE ITERATOR | | 1 | 9 | 1 (0)| 00:00:01 | KEY | KEY | |* 17 | INDEX UNIQUE SCAN | PK_UR_USER_INFO | 1 | 9 | 1 (0)| 00:00:01 | KEY | KEY | ------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 8 - access("USR"."CUST_ID"=TO_NUMBER(:CUST_ID)) 11 - access("USR"."ID_NO"="OL"."ID_NO"(+)) 14 - filter("OL"."DATA_STATUS"='00') 15 - access("OL"."CUST_ID"=TO_NUMBER(:CUST_ID)) 17 - access("USR"."ID_NO"(+)="OL"."ID_NO") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 582 consistent gets 0 physical reads 0 redo size 657 bytes sent via SQL*Net to client 492 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed ================================》11g执行计划 Execution Plan ---------------------------------------------------------- Plan hash value: 2721392943 ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 55 (2)| 00:00:01 | | | | 1 | SORT AGGREGATE | | 1 | | | | | | | 2 | VIEW | | 3 | | 55 (2)| 00:00:01 | | | | 3 | HASH GROUP BY | | 3 | 231 | 55 (2)| 00:00:01 | | | | 4 | VIEW | | 3 | 231 | 54 (0)| 00:00:01 | | | | 5 | UNION-ALL | | | | | | | | | 6 | NESTED LOOPS OUTER | | 2 | 86 | 36 (0)| 00:00:01 | | | | 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| UR_USER_INFO | 2 | 62 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 8 | INDEX RANGE SCAN | IDX_USERINFO_IX | 2 | | 2 (0)| 00:00:01 | | | | 9 | PARTITION RANGE ALL | | 1 | 12 | 16 (0)| 00:00:01 | 1 | 18 | | 10 | TABLE ACCESS BY LOCAL INDEX ROWID| OR_ORDERLINE_INFO | 1 | 12 | 16 (0)| 00:00:01 | 1 | 18 | |* 11 | INDEX RANGE SCAN | IDX_OR_ORDERLINE_INFO_IN | 2 | | 14 (0)| 00:00:01 | 1 | 18 | | 12 | NESTED LOOPS OUTER | | 1 | 43 | 18 (0)| 00:00:01 | | | | 13 | PARTITION RANGE ALL | | 1 | 34 | 17 (0)| 00:00:01 | 1 | 18 | |* 14 | TABLE ACCESS BY LOCAL INDEX ROWID| OR_ORDERLINE_INFO | 1 | 34 | 17 (0)| 00:00:01 | 1 | 18 | |* 15 | INDEX RANGE SCAN | IDX_OR_ORDERLINE_INFO_CI | 3 | | 15 (0)| 00:00:01 | 1 | 18 | | 16 | PARTITION RANGE ITERATOR | | 1 | 9 | 1 (0)| 00:00:01 | KEY | KEY | |* 17 | INDEX UNIQUE SCAN | PK_UR_USER_INFO | 1 | 9 | 1 (0)| 00:00:01 | KEY | KEY | ------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 8 - access("USR"."CUST_ID"=TO_NUMBER(:CUST_ID)) 11 - access("USR"."ID_NO"="OL"."ID_NO"(+)) 14 - filter("OL"."DATA_STATUS"='00') 15 - access("OL"."CUST_ID"=TO_NUMBER(:CUST_ID)) 17 - access("USR"."ID_NO"(+)="OL"."ID_NO") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 837 consistent gets 0 physical reads 0 redo size 676 bytes sent via SQL*Net to client 520 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
事先得说明一下,11g的执行计划里面涉及的表和索引我都重新收集了统计信息。从这个执行计划中可以看出,我们的consistent gets从582增加到了837,而执行计划的cost其实是下降了,从225下降到55。所以我们从多出的这200多个consistent gets着手。如果我们要追踪这些块,这里需要使用10200 event。
1.做10200 event,从trace里面找信息
*** 2014-05-19 15:55:43.558 *** ACTION NAME:() 2014-05-19 15:55:43.556 *** MODULE NAME:(sqlplus@w25k03da (TNS V1-V3)) 2014-05-19 15:55:43.556 *** SERVICE NAME:(SYS$USERS) 2014-05-19 15:55:43.556 *** SESSION ID:(5146.581) 2014-05-19 15:55:43.556 Consistent read started for block 34 : 11c999f4 env: (scn: 0x0af1.fa66895f xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: xid: 0x0000.000.00000000 scn: 0x0000.00000000 8sch: scn: 0x0000.00000000) CR exa ret 2 on: 3800103c0 scn: 0xffff.ffffffff xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 scn: 0xffff.ffffffff sfl: 0 Consistent read finished for block 34 : 11c999f4 Consistent read started for block 34 : 7ece819f env: (scn: 0x0af1.fa66895f xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: xid: 0x0000.000.00000000 scn: 0x0000.00000000 8sch: scn: 0x0000.00000000) CR exa ret 2 on: 3800103c0 scn: 0xffff.ffffffff xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 scn: 0xffff.ffffffff sfl: 0 Consistent read finished for block 34 : 7ece819f *** 2014-05-19 15:35:19.883 ktrgtc2(): started for block objd: 0x00020c7c env [0x9fffffffbf36cf94]: (scn: 0x0aef.60c6b2af xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00000000 flg: 0x00000660) ktrexc(): returning 2 on: 6000000000177a70 cr-scn: 0xffff.ffffffff xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 cl-scn: 0xffff.ffffffff sfl: 0 ktrgtc2(): completed for block objd: 0x00020c7c ktrgtc2(): started for block objd: 0x00020c7c env [0x9fffffffbf36cf94]: (scn: 0x0aef.60c6b2af xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00000000 flg: 0x00000660) ktrexc(): returning 2 on: 6000000000177a70 cr-scn: 0xffff.ffffffff xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 cl-scn: 0xffff.ffffffff sfl: 0 ktrgtc2(): completed for block objd: 0x00020c7c ktrget2(): started for block objd: 0x00020c7c env [0x9fffffffbf36cf94]: (scn: 0x0aef.60c6b2af xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00000000 flg: 0x00000660) showing 0xc000000a65ec1b18 694591c4 (2) dscn ffffffff:ffff bcrp ffff:ffffffff, bestcrp (nil) ktrexf(): returning 9 on: 6000000000177a70 cr-scn: 0xffff.ffffffff xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 cl-scn: 0xffff.ffffffff sfl: 0 new dscn ffff:ffffffff ret=9 ktrgcm(): completed for block objd: 0x00020c7c ktrget3(): completed for block objd: 0x00020c7c
上面可以看到,在10g中,我们10200的一致读的信息是:Consistent read started for block 34 : 11c999f4,而我们11g的一致读的信息是ktrgtc2(): started for block objd: 0x00020c7c.可以看到11g比10g的trace更加的智能,它直接告诉了你object_id号,这可以帮助我们快速定位到相关的对象。而10g呢?它给出的只是一个DBA的地址,我们要通过相关函数先去转换成文件号和块号,然后通过文件号和块号去dba_extents去查具体对象名。
2.根据trace的地址找到真实的object
================================》10g Trace Consistent read started for block 34 : 11c999f4 Consistent read started for block 34 : 7ece819f Consistent read started for block 34 : 4ac285db Consistent read started for block 34 : 8347e921 Consistent read started for block 12 : 0c06184e Consistent read started for block 36 : 25c92364 Consistent read started for block 36 : 2bc7db6d Consistent read started for block 36 : 2a01835b Consistent read started for block 36 : 25c9237c Consistent read started for block 36 : 25c9237f Consistent read started for block 36 : 25c92384 Consistent read started for block 36 : 25c9898c Consistent read started for block 36 : 25c9899c Consistent read started for block 36 : 25c9899e SELECT DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE (TO_NUMBER ('11C999F4', 'XXXXXXXX'))AS FILE_NO,DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK (TO_NUMBER ('11C999F4', 'XXXXXXXX')) AS BLOCK_NO FROM DUAL; SELECT SEGMENT_NAME FROM DBA_EXTENTS A WHERE FILE_ID=71 AND 629236 BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS-1; ================================》11g Trace ktrgtc2(): started for block objd: 0x00020c7c ktrgtc2(): started for block objd: 0x00020c7c ktrget2(): started for block objd: 0x00020c7c ktrget2(): started for block objd: 0x0001b36a ktrgtc2(): started for block objd: 0x0002149d ktrget2(): started for block objd: 0x0002149d ktrget2(): started for block objd: 0x0002149e ktrget2(): started for block objd: 0x0002149e ktrget2(): started for block objd: 0x0002149f ktrget2(): started for block objd: 0x000214a0 ktrget2(): started for block objd: 0x000214a1 ktrget2(): started for block objd: 0x000214a1 select to_number('20c7c','XXXXX') from dual; select object_name from dba_objects where object_id='134268';
通过上述查询及统计,我们定位出,10g里面查询索引IDX_OR_ORDERLINE_INFO_IN,读取块514次,而在11g里面查询索引IDX_OR_ORDERLINE_INFO_IN读取783次,所以这个索引应该有问题,我们来看看索引的情况。
================================》10g SQL> select INDEX_NAME,PARTITION_NAME,BLEVEL,LEAF_BLOCKS,NUM_ROWS from DBA_IND_PARTITIONS where INDEX_NAME='IDX_OR_ORDERLINE_INFO_IN'; INDEX_NAME PARTITION_NAME BLEVEL LEAF_BLOCKS NUM_ROWS ------------------------------ ------------------------------ ---------- ----------- ---------- IDX_OR_ORDERLINE_INFO_IN PART_01 2 1106 52833 IDX_OR_ORDERLINE_INFO_IN PART_02 1 3 381 IDX_OR_ORDERLINE_INFO_IN PART_03 0 1 4 IDX_OR_ORDERLINE_INFO_IN PART_04 0 1 10 IDX_OR_ORDERLINE_INFO_IN PART_05 1 3 206 IDX_OR_ORDERLINE_INFO_IN PART_06 1 165 3766 IDX_OR_ORDERLINE_INFO_IN PART_07 0 1 4 IDX_OR_ORDERLINE_INFO_IN PART_08 0 1 9 IDX_OR_ORDERLINE_INFO_IN PART_09 0 1 14 IDX_OR_ORDERLINE_INFO_IN PART_10 1 496 10939 IDX_OR_ORDERLINE_INFO_IN PART_11 1 448 10377 IDX_OR_ORDERLINE_INFO_IN PART_12 0 1 54 IDX_OR_ORDERLINE_INFO_IN PART_13 0 1 13 IDX_OR_ORDERLINE_INFO_IN PART_14 0 1 4 IDX_OR_ORDERLINE_INFO_IN PART_15 1 16 4431 IDX_OR_ORDERLINE_INFO_IN PART_16 1 88 3380 IDX_OR_ORDERLINE_INFO_IN PART_17 1 292 5994 IDX_OR_ORDERLINE_INFO_IN PART_18 0 0 0 ================================》11g SQL> select INDEX_NAME,PARTITION_NAME,BLEVEL,LEAF_BLOCKS,NUM_ROWS from DBA_IND_PARTITIONS where INDEX_NAME='IDX_OR_ORDERLINE_INFO_IN'; INDEX_NAME PARTITION_NAME BLEVEL LEAF_BLOCKS NUM_ROWS GLO ------------------------------ ------------------------------ ---------- ----------- ---------- --- IDX_OR_ORDERLINE_INFO_IN PART_01 1 152 54315 YES IDX_OR_ORDERLINE_INFO_IN PART_02 1 2 426 YES IDX_OR_ORDERLINE_INFO_IN PART_03 0 1 23 YES IDX_OR_ORDERLINE_INFO_IN PART_04 0 1 78 YES IDX_OR_ORDERLINE_INFO_IN PART_05 1 2 621 YES IDX_OR_ORDERLINE_INFO_IN PART_06 1 15 5248 YES IDX_OR_ORDERLINE_INFO_IN PART_07 0 1 66 YES IDX_OR_ORDERLINE_INFO_IN PART_08 0 1 244 YES IDX_OR_ORDERLINE_INFO_IN PART_09 0 1 18 YES IDX_OR_ORDERLINE_INFO_IN PART_10 1 38 13537 YES IDX_OR_ORDERLINE_INFO_IN PART_11 1 42 15065 YES IDX_OR_ORDERLINE_INFO_IN PART_12 0 1 107 YES IDX_OR_ORDERLINE_INFO_IN PART_13 0 1 17 YES IDX_OR_ORDERLINE_INFO_IN PART_14 0 1 33 YES IDX_OR_ORDERLINE_INFO_IN PART_15 1 13 4431 YES IDX_OR_ORDERLINE_INFO_IN PART_16 1 12 4081 YES IDX_OR_ORDERLINE_INFO_IN PART_17 1 28 9846 YES IDX_OR_ORDERLINE_INFO_IN PART_18 0 0 0 YES
因为11g是新创建的索引,我们可以看到其实它的leaf_blocks其实比较少,而相同的10g是很多的。并且PART_01这个分区在10g上索引的层级是2级。那么按照正常思维来看,10g需要扫描更多的索引块才能获取到相关数据才对。但是我们看到的情况正好相反。那么问题究竟出在哪儿?
此时一个同事提醒我,把语句真实的运行了一把,运行出来后,顿时恍然大悟,这就是一个狗血的案例。简直是一个血的教训啊。因为环境的原因,10g上和11g上有些数据差异,但是数据量的差异其实是很小的(这点从统计信息上可以看到)。而11g统计信息也是重新收集了。问题就出在数据的差异导致11g查询出来的结果集是29条,而10g查询出来的是19条,查询的结果集多了,当然buffer get就多了。这就是最大的原因。
SQL> select a.ID_NO as ID_NO, min(a.DATA_STATUS) as DATA_STATUS, min(a.IS_OPEN) as IS_OPEN, from (select usr.id_no AS ID_NO, decode(ol.data_status, null, '99', '01') AS DATA_STATUS, nvl(ol.id_no, 0) as IS_OPEN, usr.phone_no as SERVICE_NO from UR_USER_INFO usr, OR_ORDERLINE_INFO ol where usr.cust_id = :CUST_ID and usr.ID_NO = ol.ID_NO(+) union all select ol.id_no AS ID_NO, ol.data_status AS DATA_STATUS, nvl(usr.id_no, 1) as IS_OPEN, ol.service_no as SERVICE_NO from UR_USER_INFO usr, OR_ORDERLINE_INFO ol where ol.cust_id = :CUST_ID and usr.ID_NO(+) = ol.ID_NO and ol.DATA_STATUS = '00' and rownum select a.ID_NO as ID_NO, min(a.DATA_STATUS) as DATA_STATUS, min(a.IS_OPEN) as IS_OPEN, from (select usr.id_no AS ID_NO, decode(ol.data_status, null, '99', '01') AS DATA_STATUS, nvl(ol.id_no, 0) as IS_OPEN, usr.phone_no as SERVICE_NO from UR_USER_INFO usr, OR_ORDERLINE_INFO ol where usr.cust_id = :CUST_ID and usr.ID_NO = ol.ID_NO(+) union all select ol.id_no AS ID_NO, ol.data_status AS DATA_STATUS, nvl(usr.id_no, 1) as IS_OPEN, ol.service_no as SERVICE_NO from UR_USER_INFO usr, OR_ORDERLINE_INFO ol where ol.cust_id = :CUST_ID and usr.ID_NO(+) = ol.ID_NO and ol.DATA_STATUS = '00') a group by a.id_no; ID_NO DA IS_OPEN ---------- -- ---------- 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6001E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6001E+13 99 0 1.6310E+13 99 0 1.6001E+13 99 0 1.6001E+13 99 0 1.6001E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 1.6310E+13 99 0 19 rows selected.
总结:不要太过依赖于执行计划出现的那些信息,有时候需要把语句真实运行一把。
原文地址:一次EVENT 10200寻找Buffer Get之旅,最终以狗血案例而告终, 感谢原作者分享。
上一篇: 深入议论PHP5对象复制技术