Oracle CBO优化模式中的5种索引访问方法浅析
本文主要讨论以下几种索引访问方法:
1.索引唯一扫描(index unique scan)
2.索引范围扫描(index range scan)
3.索引全扫描(index full scan)
4.索引跳跃扫描(index skip scan)
5.索引快速全扫描(index fast full scan)
索引唯一扫描(index unique scan)
通过这种索引访问数据的特点是对于某个特定的值只返回一行数据,通常如果在查询谓语中使用uniqe和primary key索引的列作为条件的时候会选用这种扫描;访问的高度总是索引的高度加一,除了某些特殊的情况,如另外存储的lob对象。
sql> set autotrace traceonly explain
sql> select * from hr.employees where employee_id = 100;
execution plan
----------------------------------------------------------
plan hash value: 1833546154
---------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
---------------------------------------------------------------------------------------------
| 0 | select statement | | 1 | 69 | 1 (0)| 00:00:01 |
| 1 | table access by index rowid| employees | 1 | 69 | 1 (0)| 00:00:01 |
|* 2 | index unique scan | emp_emp_id_pk | 1 | | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
2 - access("employee_id"=100)
索引范围扫描(index range scan)
谓语中包含将会返回一定范围数据的条件时就会选用索引范围扫描,索引可以是唯一的亦可以是不唯一的;所指定的条件可以是(<,>,like,between,=)等运算符,不过使用like的时候,如果使用了通配符%,极有可能就不会使用范围扫描,因为条件过于的宽泛了,下面是一个示例:
sql> select * from hr.employees where department_id = 30;
6 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 2056577954
-------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
-------------------------------------------------------------------------------------------------
| 0 | select statement | | 6 | 414 | 2 (0)| 00:00:01 |
| 1 | table access by index rowid| employees | 6 | 414 | 2 (0)| 00:00:01 |
|* 2 | index range scan | emp_department_ix | 6 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
2 - access("department_id"=30)
statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
7 consistent gets
1 physical reads
0 redo size
1716 bytes sent via sql*net to client
523 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
6 rows processed
范围扫描的条件需要准确的分析返回数据的数目,范围越大就越可能执行全表扫描;
sql> select department_id,count(*) from hr.employees group by department_id order by count(*);
department_id count(*)
------------- ----------
10 1
40 1
1
70 1
20 2
110 2
90 3
60 5
30 6
100 6
80 34
50 45
12 rows selected.
-- 这里使用数值最多的50来执行范围扫描
sql> set autotrace traceonly explain
sql> select * from hr.employees where department_id = 50;
45 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 1445457117
-------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
-------------------------------------------------------------------------------
| 0 | select statement | | 45 | 3105 | 3 (0)| 00:00:01 |
|* 1 | table access full| employees | 45 | 3105 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter("department_id"=50)
statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10 consistent gets
0 physical reads
0 redo size
4733 bytes sent via sql*net to client
545 bytes received via sql*net from client
4 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
45 rows processed
可以看到在获取范围数据较大的时候,优化器还是执行了全表扫描方法。
一种对于索引范围扫描的优化方法是使用升序排列的索引来获得降序排列的数据行,这种情况多发生在查询中包含有索引列上的order by子句的时候,这样就可避免一次排序操作了,如下:
sql> set autotrace traceonly explain
sql> select * from hr.employees
2 where department_id in (90, 100)
3 order by department_id desc;
execution plan
----------------------------------------------------------
plan hash value: 3707994525
---------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
---------------------------------------------------------------------------------------------------
| 0 | select statement | | 9 | 621 | 2 (0)| 00:00:01 |
| 1 | inlist iterator | | | | | |
| 2 | table access by index rowid | employees | 9 | 621 | 2 (0)| 00:00:01 |
|* 3 | index range scan descending| emp_department_ix | 9 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
3 - access("department_id"=90 or "department_id"=100)
上例中,索引条目被相反的顺序读取,避免了排序操作。
索引全扫描(index full scan)
索引全扫描的操作将会扫描索引结构的每一个叶子块,读取每个条目的的行编号,并取出数据行,既然是访问每一个索引叶子块,那么它相对的全表扫描的优势在哪里呢?实际上在索引块中因为包含的信息列数较少,通常都是索引键和rowid,所以对于同一个数据块和索引块,包含的索引键的条目数通常都是索引块中居多,因此如果查询字段列表中所有字段都是索引的一部分的时候,就可以完全跳过对表数据的访问了,这种情况索引全扫描的方法会获得更高的效率。
发生索引全扫描的情况有很多,几种典型的场景:
1,查询总缺少谓语,但获取的列可以通过索引直接获得
sql> select email from hr.employees;
execution plan
----------------------------------------------------------
plan hash value: 2196514524
---------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
---------------------------------------------------------------------------------
| 0 | select statement | | 107 | 856 | 1 (0)| 00:00:01 |
| 1 | index full scan | emp_email_uk | 107 | 856 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------
2,查询谓语中包含一个位于索引中非引导列上的条件(其实也取决于引导列值的基数大小,如果引导列的唯一值较少,也可能出现跳跃扫描的情况)
sql> select first_name, last_name from hr.employees
2 where first_name like 'a%' ;
execution plan
----------------------------------------------------------
plan hash value: 2228653197
--------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------
| 0 | select statement | | 3 | 45 | 1 (0)| 00:00:01 |
|* 1 | index full scan | emp_name_ix | 3 | 45 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access("first_name" like 'a%')
filter("first_name" like 'a%')
sql> set long 2000000
sql> select dbms_metadata.get_ddl('index','emp_name_ix','hr') from dual;
dbms_metadata.get_ddl('index','emp_name_ix','hr')
--------------------------------------------------------------------------------
create index "hr"."emp_name_ix" on "hr"."employees" ("last_name", "first_name"
)
pctfree 10 initrans 2 maxtrans 255 nologging compute statistics
storage(initial 65536 next 1048576 minextents 1 maxextents 2147483645
pctincrease 0 freelists 1 freelist groups 1 buffer_pool default flash_cache de
fault cell_flash_cache default)
tablespace "example"
-- 可以看到emp_name_ix索引是建立在列(("last_name", "first_name")上的,使用了带非引导列first_name的谓语
3,数据通过一个已经排序的索引获得从而省去单独的排序操作
sql> select * from hr.employees order by employee_id ;
execution plan
----------------------------------------------------------
plan hash value: 2186312383
---------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
---------------------------------------------------------------------------------------------
| 0 | select statement | | 107 | 7383 | 3 (0)| 00:00:01 |
| 1 | table access by index rowid| employees | 107 | 7383 | 3 (0)| 00:00:01 |
| 2 | index full scan | emp_emp_id_pk | 107 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
-- 同样可以使用升序索引返回降序数据
sql> select employee_id from hr.employees order by employee_id desc ;
execution plan
----------------------------------------------------------
plan hash value: 753568220
--------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------------------
| 0 | select statement | | 107 | 428 | 1 (0)| 00:00:01 |
| 1 | index full scan descending| emp_emp_id_pk | 107 | 428 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
在上面的例子中可以看出,索引全扫描也可以想范围扫描一样,通过升序索引返回降序数据,而它的优化不止这一种,当我们查询某一列的最大值或最小值而这一列又是索引列的时候,索引全扫描就会获得非常显著的优势,因为这时的优化器并没有对索引的数据进行全部叶子节点的检索,而只是对一个根块,第一个或最后一个叶子块的扫描,这无疑会显著的提高性能!!
-- 索引全扫描获得最小值
sql> select min(department_id) from hr.employees ;
execution plan
----------------------------------------------------------
plan hash value: 613773769
------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
------------------------------------------------------------------------------------------------
| 0 | select statement | | 1 | 3 | 1 (0)| 00:00:01 |
| 1 | sort aggregate | | 1 | 3 | | |
| 2 | index full scan (min/max)| emp_department_ix | 1 | 3 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
-- 如果同时包含max和min的求值,优化器并不会主动选择效率较高的索引全扫描方法
sql> select min(department_id), max(department_id) from hr.employees ;
execution plan
----------------------------------------------------------
plan hash value: 1756381138
--------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------
| 0 | select statement | | 1 | 3 | 3 (0)| 00:00:01 |
| 1 | sort aggregate | | 1 | 3 | | |
| 2 | table access full| employees | 107 | 321 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
-- 一种替代的优化方案
sql> select
2 (select min(department_id) from hr.employees) min_id,
3 (select max(department_id) from hr.employees) max_id
4 from dual;
execution plan
----------------------------------------------------------
plan hash value: 2189307159
------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
------------------------------------------------------------------------------------------------
| 0 | select statement | | 1 | | 2 (0)| 00:00:01 |
| 1 | sort aggregate | | 1 | 3 | | |
| 2 | index full scan (min/max)| emp_department_ix | 1 | 3 | 1 (0)| 00:00:01 |
| 3 | sort aggregate | | 1 | 3 | | |
| 4 | index full scan (min/max)| emp_department_ix | 1 | 3 | 1 (0)| 00:00:01 |
| 5 | fast dual | | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
索引跳跃扫描(index skip scan)
这种扫描方式也是一种特例,因为在早期的版本中,优化器会因为使用了非引导列而拒绝使用索引。跳跃扫描的前提有着对应的情景,当谓语中包含索引中非引导列上的条件,并且引导列的唯一值较小的时候,就有极有可能使用索引跳跃扫描方法;同索引全扫描,范围扫描一样,它也可以升序或降序的访问索引;不同的是跳跃扫描会根据引导列的唯一值数目将复合索引分成多个较小的逻辑子索引,引导列的唯一值数目越小,分割的子索引数目也就越少,就越可能达到相对全表扫描较高的运算效率。
-- 创建测试表,以dba_objects表为例
sql> create table test as select * from dba_objects;
table created.
-- 创建一个复合索引,这里选取了一个唯一值较少的owner列作为引导列
sql> create index i_test on test(owner,object_id,object_type) ;
index created.
-- 分析表收集统计信息
sql> exec dbms_stats.gather_table_stats('sys','test');
pl/sql procedure successfully completed.
-- 先看一下引导列的唯一值的比较
sql> select count(*),count(distinct owner) from test;
count(*) count(distinctowner)
---------- --------------------
72482 29
-- 使用非引导列的条件查询来访问触发skip scan
sql> select * from test where object_id = 46;
execution plan
----------------------------------------------------------
plan hash value: 1001786056
--------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------------
| 0 | select statement | | 1 | 97 | 31 (0)| 00:00:01 |
| 1 | table access by index rowid| test | 1 | 97 | 31 (0)| 00:00:01 |
|* 2 | index skip scan | i_test | 1 | | 30 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
2 - access("object_id"=46)
filter("object_id"=46)
statistics
----------------------------------------------------------
101 recursive calls
0 db block gets
38 consistent gets
0 physical reads
0 redo size
1610 bytes sent via sql*net to client
523 bytes received via sql*net from client
2 sql*net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1 rows processed
-- 来看看这条语句全扫描的效率
sql> select /*+ full(test) */ * from test where object_id = 46;
execution plan
----------------------------------------------------------
plan hash value: 1357081020
--------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------
| 0 | select statement | | 1 | 97 | 282 (1)| 00:00:04 |
|* 1 | table access full| test | 1 | 97 | 282 (1)| 00:00:04 |
--------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter("object_id"=46)
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1037 consistent gets
0 physical reads
0 redo size
1607 bytes sent via sql*net to client
523 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
分析上面的查询可以看出,我们使用的索引中引导列有29个唯一值,也就是说在执行索引跳跃扫描的时候,分割成了29个逻辑子索引来查询,只产生了38次逻辑读;而相对全表扫描的1037次逻辑读,性能提升非常明显!
索引快速全扫描(index fast full scan)
这种访问方法在获取数据上和全表扫描相同,都是通过无序的多块读取来进行的,因此也就无法使用它来避免排序代价了;索引快速全扫描通常发生在查询列都在索引中并且索引中一列有非空约束时,当然这个条件也容易发生索引全扫描,它的存在多可用来代替全表扫描,比较数据获取不需要访问表上的数据块。
-- 依旧使用上面创建的test表
sql> desc test
name null? type
----------------------------------------- -------- ----------------------------
owner varchar2(30)
object_name varchar2(128)
subobject_name varchar2(30)
object_id not null number
data_object_id number
object_type varchar2(19)
created date
last_ddl_time date
timestamp varchar2(19)
status varchar2(7)
temporary varchar2(1)
generated varchar2(1)
secondary varchar2(1)
namespace number
edition_name varchar2(30)
-- 在object_id列上创建索引
sql> create index pri_inx on test (object_id);
index created.
-- 直接执行全表扫描
sql> select object_id from test;
72482 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 1357081020
--------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------
| 0 | select statement | | 72482 | 353k| 282 (1)| 00:00:04 |
| 1 | table access full| test | 72482 | 353k| 282 (1)| 00:00:04 |
--------------------------------------------------------------------------
statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5799 consistent gets
0 physical reads
0 redo size
1323739 bytes sent via sql*net to client
53675 bytes received via sql*net from client
4834 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
72482 rows processed
-- 修改object_id为not null
sql> alter table test modify (object_id not null);
table altered.
-- 再次使用object_id列查询就可以看到使用了快速全扫描了
sql> select object_id from test;
72482 rows selected.
execution plan
----------------------------------------------------------
plan hash value: 3806735285
--------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------
| 0 | select statement | | 72482 | 353k| 45 (0)| 00:00:01 |
| 1 | index fast full scan| pri_inx | 72482 | 353k| 45 (0)| 00:00:01 |
--------------------------------------------------------------------------------
statistics
----------------------------------------------------------
167 recursive calls
0 db block gets
5020 consistent gets
161 physical reads
0 redo size
1323739 bytes sent via sql*net to client
53675 bytes received via sql*net from client
4834 sql*net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
72482 rows processed
ps,这个index fast full scan的例子真是不好模拟,上面的例子弄了好久。。。。。
上一篇: 自媒体应怎样学会创作?笔者用四点告诉你
下一篇: tomcat,nginx日志定时清理