欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

复合索引创建案例分享-索引的access和filter(一)

程序员文章站 2022-05-12 09:51:21
...

好久没写blog了,确实刚来南京才一个多月,新的工作也需要慢慢适应,学习脚步因此也确实放慢了很多,虽然不见得以后一直做技术,但是如果能做一天就当认真对待,言归正传,有个sql语句因为走全表扫描的执行计划需要优化,具体如下: SELECT OID, SUBSID FROM

好久没写blog了,确实刚来南京才一个多月,新的工作也需要慢慢适应,学习脚步因此也确实放慢了很多,虽然不见得以后一直做技术,但是如果能做一天就当认真对待,言归正传,有个sql语句因为走全表扫描的执行计划需要优化,具体如下:

SELECT OID, SUBSID
FROM SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 591110695

--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 263K(100)| | | |
|* 1 | COUNT STOPKEY | | | | | | | |
|* 2 | FILTER | | | | | | | |
|* 3 | FILTER | | | | | | | |
| 4 | PARTITION RANGE SINGLE | | 90 | 3060 | 263K (2)| 00:52:44 | 4 | 4 |
|* 5 | TABLE ACCESS FULL | SUBS_SERVICE | 90 | 3060 | 263K (2)| 00:52:44 | 4 | 4 |
| 6 | PARTITION RANGE SINGLE | | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID| SUBS_SERVICE | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 1 | | 4 (0)| 00:00:01 | KEY | KEY |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM 2 - filter( IS NULL)
3 - filter(SYSDATE@!-395 5 - filter((INTERNAL_FUNCTION("SERVICEID") AND "STATUS"8 AND "STATUS"9 AND "ENDDATE" "ENDDATE">=SYSDATE@!-395 AND "REGION"=23))
7 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
8 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))

这里很明显有个bad的执行计划table access full SUBS_SERVICE的全表扫描,通过谓词条件基本可以得知正式因为这个bad的执行计划导致,而优化这个sql其实很简单就是建立合适的索引。

那么接下来如何建立索引了,我们看执行计划ID:5对应的谓词条件是enddate和serviceid的两个列的条件,而serviceid的不同值较少,enddate的不同值相对较多。

Table Number Empty Chain Average Global Sample Date
Name of Rows Blocks Blocks Count Row Len Stats Size MM-DD-YYYY
------------------------------ -------------- --------------- ------------ -------- ------- ------ -------------- ----------
SUBS_SERVICE 322,621,100 56,409,85 0 0 109 YES 16,131,055 09-28-2014

Column Distinct Number Number Sample Date
Name Values Density Buckets Nulls Size MM-DD-YYYY
------------------------------ ------------ ----------- ------- ------------ -------------- ----------
SUBSID 18,668,054 .00000005 1 0 16,131,055 09-28-2014
REGION 4 .25000000 1 0 16,131,055 09-28-2014
SERVICEID 402 .00248756 1 0 16,131,055 09-28-2014
ENDDATE 1,628,520 .00000061 1 258,160,160 3,223,047 09-28-2014
STATUS 7 .14285714 1 0 16,131,055 09-28-2014
...

这种情况下,我们一般可能都是选择的enddate作为前导列,serviceid作为后导列的组合索引,也是为了能够有效的利用enddate作为前导列去驱动别的sql语句来利用这个索引。

挖掘shared pool和sql历史的执行信息:

SQL> select sql_id
2 from gv$sql_plan
3 where options = 'FULL'
4 and object_owner = 'TBCS'
5 and object_name like 'SUBS_SERVICE'
6 and instr(filter_predicates, 'ENDDATE') > 0
7 and instr(filter_predicates, 'SERVICEID')
no rows selected

SQL> select sql_id
2 from dba_hist_sql_plan
3 where object_owner = 'TBCS'
4 and object_name like 'SUBS_SERVICE'
5 and instr(filter_predicates, 'ENDDATE') > 0
6 and instr(filter_predicates, 'SERVICEID')

挖掘shared pool中发觉没有单独出现enddate的sql语句,那么是没有别的sql语句能够利用enddate为前缀的索引来索引扫描的。

虽然这里enddate的前缀索引可能没有办法作用于别的sql语句,但是对于这个sql而言,建立索引是必然的,首先建立enddate的前缀的复合索引ind_enddate_serviceid

SQL> create index tbcs.ind_enddate_serviceid on tbcs.subs_service(enddate,serviceid)

Index created.

执行上述sql语句,查看每个步骤产生的逻辑读和资源消耗
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f1cmqc5sag5s0, child number 0
-------------------------------------
SELECT /*+gather_plan_statistics*/OID, SUBSID
FROM tbcs.SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM tbcs.SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 2714263820

------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:00:16.94 | 2619 | 1797 |
|* 1 | COUNT STOPKEY | | 1 | | 200 |00:00:16.94 | 2619 | 1797 |
|* 2 | FILTER | | 1 | | 200 |00:00:16.94 | 2619 | 1797 |
|* 3 | FILTER | | 1 | | 261 |00:00:44.31 | 1237 | 1218 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 1 | 90 | 261 |00:00:44.31 | 1237 | 1218 |
|* 5 | INDEX RANGE SCAN | IND_ENDDATE_SERVICEID | 1 | 20 | 2078 |00:00:02.73 | 873 | 859 |
| 6 | PARTITION RANGE SINGLE | | 257 | 1 | 58 |00:00:04.86 | 1382 | 579 |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 257 | 1 | 58 |00:00:04.86 | 1382 | 579 |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 257 | 1 | 325 |00:00:04.22 | 1057 | 510 |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM 2 - filter( IS NULL)
3 - filter(SYSDATE@!-395 4 - filter(("STATUS"8 AND "STATUS"9 AND "REGION"=23))
5 - access("ENDDATE">=SYSDATE@!-395 AND "ENDDATE" filter(("SERVICEID"=:SERVICEID1 OR "SERVICEID"=:SERVICEID2 OR "SERVICEID"=:SERVICEID3))
7 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
8 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))

sql语句已经走了ind_enddate_serviceid的索引范围扫描,执行计划id:5索引范围扫描消耗的逻辑读为873,而后回表达到了1237的逻辑读

而我们看ID:5 oracle通过(enddate,serviceid)组合索引IND_ENDDATE_SERVICEID只能用access过滤满足enddate的条件,需要filter再次对serviceid的条件进行过滤

这种情况下index range scan只需要扫描满足enddate的谓词条件,会扫描更多的叶块节点,产生更多的逻辑读。

那么既然ind_enddate_serviceid的索引其实在index range scan部分索引的范围扫描只针对了enddate条件的,那么是否我们可以直接建立enddate的单列索引了。

SQL> create index tbcs.ind_enddate on tbcs.subs_service(enddate);

Index created.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID c12gvu0qs792j, child number 0
-------------------------------------
SELECT /*+gather_plan_statistics index(A,ind_enddate)*/OID, SUBSID
FROM tbcs.SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM tbcs.SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 439697064

------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:10:41.48 | 37372 | 32157 |
|* 1 | COUNT STOPKEY | | 1 | | 200 |00:10:41.48 | 37372 | 32157 |
|* 2 | FILTER | | 1 | | 200 |00:10:41.48 | 37372 | 32157 |
|* 3 | FILTER | | 1 | | 261 |00:06:07.29 | 35990 | 32095 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 1 | 90 | 261 |00:06:07.29 | 35990 | 32095 |
|* 5 | INDEX RANGE SCAN | IND_ENDDATE | 1 | 2810 | 199K|00:00:03.08 | 660 | 646 |
| 6 | PARTITION RANGE SINGLE | | 257 | 1 | 58 |00:00:00.11 | 1382 | 62 |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 257 | 1 | 58 |00:00:00.11 | 1382 | 62 |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 257 | 1 | 325 |00:00:00.09 | 1057 | 50 |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM 2 - filter( IS NULL)
3 - filter(SYSDATE@!-395 4 - filter((INTERNAL_FUNCTION("SERVICEID") AND "STATUS"8 AND "STATUS"9 AND "REGION"=23))
5 - access("ENDDATE">=SYSDATE@!-395 AND "ENDDATE" 7 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
8 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))

由于ind_enddate是单列索引,每个叶块存储的键值会多些,那么index range scan部分消耗应该会更小,其实果然也是如我们推断的,但是我们发觉在通过ind_enddate回表时逻辑读增加到了35990

这个是由于虽然扫描的索引叶块更少了,但是扫描完后不能做任何进一步的过滤,导致需要回表的rowid非常多,导致回表成本增加,而在回表后再进行serviceid谓词条件的过滤。

那么除了上述的两种索引的创建方式,是否还有一种更优秀的:

如果我们创建serviceid作为前导列,enddate作为后导列的索引ind_serviceid_enddate

SQL> create index tbcs.ind_serviceid_enddateon tbcs.subs_service(serviceid,enddate);

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 48bfmd32ag3nk, child number 0
-------------------------------------
SELECT /*+gather_plan_statistics*/OID, SUBSID
FROM tbcs.SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM tbcs.SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 771671576

----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:00:00.01 | 1745 |
|* 1 | COUNT STOPKEY | | 1 | | 200 |00:00:00.01 | 1745 |
|* 2 | FILTER | | 1 | | 200 |00:00:00.01 | 1745 |
|* 3 | FILTER | | 1 | | 256 |00:00:00.01 | 373 |
| 4 | INLIST ITERATOR | | 1 | | 256 |00:00:00.01 | 373 |
|* 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 1 | 90 | 256 |00:00:00.01 | 373 |
|* 6 | INDEX RANGE SCAN | IND_SERVICEID_ENDDATE | 1 | 20 | 3186 |00:00:00.01 | 33 |
| 7 | PARTITION RANGE SINGLE | | 256 | 1 | 56 |00:00:00.01 | 1372 |
|* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 256 | 1 | 56 |00:00:00.01 | 1372 |
|* 9 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 256 | 1 | 328 |00:00:00.01 | 1044 |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM 2 - filter( IS NULL)
3 - filter(SYSDATE@!-395 5 - filter(("STATUS"8 AND "STATUS"9 AND "REGION"=23))
6 - access((("SERVICEID"=:SERVICEID1 OR "SERVICEID"=:SERVICEID2 OR "SERVICEID"=:SERVICEID3)) AND
"ENDDATE">=SYSDATE@!-395 AND "ENDDATE" 8 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
9 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))

以serviceid前导列索引前提下,执行计划ID:6 oracle通过(serviceid,enddate)组合索引IND_SERVICEID_ENDDATE可以全部用access的方式完成index range scan,

此时index range scan只会扫描同时满足serviceid和enddate的谓词条件的索引叶块,相对索引ind_enddate_serviceid而言,serviceid作为前导列的索引在index range scan扫描更少索引叶块,相应的逻辑读也会更低。

而如果我们细心观察发现索引ind_enddate_serviceid和ind_serviceid_enddate主要区别是在是否在index range scan阶段能够直接access方式读取数据,而在回表阶段其实消耗的逻辑读大体相同。

ind_enddate_serviceid回表的逻辑读=1237-873=364

ind_serviceid_enddate回表的逻辑读=373-33=340

区别主要在于index range scan部分的区别,两个复合索引扫描的叶块是完全不同的,索引能够全部走access的索引必然成本要低很多,而先走access然后走filter的索引,虽然回表成本不变,但是index range scan部分会扫描很多不满足的条件的leaf block,导致index range scan部分逻辑读增加。

这里我们再看一个sql语句,如果是两个范围的索引如何去创建索引:

SQL> SELECT OID, SUBSID
2 FROM tbcs.SUBS_SERVICE A
3 WHERE SERVICEID between :serviceid1 and :serviceid3
4 AND ENDDATE 5 AND ENDDATE >= SYSDATE - 395
6 AND REGION = 23
7 AND STATUS 8
8 AND STATUS 9
9 AND NOT EXISTS (SELECT 1
10 FROM tbcs.SUBS_SERVICE B
11 WHERE B.REGION = A.REGION
12 AND B.SUBSID = A.SUBSID
13 AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
14 AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
15 AND ROWNUM
200 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 928699648

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 1020 | 4813 (1)| 00:00:58 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
|* 2 | FILTER | | | | | | | |
|* 3 | FILTER | | | | | | | |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 30 | 1020 | 4807 (1)| 00:00:58 | 4 | 4 |
|* 5 | INDEX RANGE SCAN | IND_SERVICEID_ENDDATE | 12 | | 4776 (1)| 00:00:58 | | |
| 6 | PARTITION RANGE SINGLE | | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 1 | | 4 (0)| 00:00:01 | KEY | KEY |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM 2 - filter( NOT EXISTS (SELECT 0 FROM "TBCS"."SUBS_SERVICE" "B" WHERE "B"."SUBSID"=:B1 AND "B"."REGION"=:B2 AND
NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365 AND ("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR
"B"."SERVICEID"=:SERVICEID3)))
3 - filter(SYSDATE@!-395=:SERVICEID1)
4 - filter("STATUS"8 AND "STATUS"9 AND "REGION"=23)
5 - access("SERVICEID">=:SERVICEID1 AND "ENDDATE">=SYSDATE@!-395 AND "SERVICEID" "ENDDATE" filter("ENDDATE"=SYSDATE@!-395)
7 - filter("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365)
8 - access("B"."SUBSID"=:B1)
filter("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1732 consistent gets
2 physical reads
0 redo size
7107 bytes sent via SQL*Net to client
663 bytes received via SQL*Net from client
15 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
200 rows processed

这里看见如果是两个范围的谓词,在执行计划ID:5中对应的谓词信息中显示同时有access和filter,而比较奇怪的是access阶段其实已经有了enddate的过滤条件,但是filter中又包含了同样的access的过滤条件,那么这个index range scan究竟是怎么完成扫描的。

这里oracle是先找到了”SERVICEID”>=:SERVICEID1 AND “ENDDATE”>=SYSDATE@!-395索引入口,然后通过索引的双向指针左右滑动来查找数据,但是由于range scan的过程中,没办法保证每个leaf block都是满足enddate的条件的,事实也是确实如此,指针滑动过程中肯定有可能出现不满足enddate的条件的数据,比如这里出现了(serviceid2,enddate-10000)的键值,而且serviceid2是大于serviceid1的,这个键值也会出现在(serviceid1,enddate-365)的右边,所以在access完成后还需要filter满足enddate的leaf block。

创建复合索引时:如果单纯为了调整某类sql语句,不考虑别的sql是否能够最大程度的使用该索引,一般将等值条件的列作为索引的前导列,这样cbo能够尽可能的在index range scan部分只扫描满足条件的leaf block。

可以参考下兔子的一篇大作: http://blog.chinaunix.net/uid-7655508-id-3639188.html