Oracle CBO几种基本的查询转换详解
在执行计划的开发过程中,转换和选择有这个不同的任务;实际上,在一个查询进行完语法和权限检查后,首先发生通称为“查询转换”的步骤,这里会进行一系列查询块的转换,然后才是“优选”(优化器为了决定最终的执行计划而为不同的计划计算成本从而选择最终的执行计划)。
我们知道查询块是以select关键字区分的,查询的书写方式决定了查询块之间的关系,各个查询块通常都是嵌在另一个查询块中或者以某种方式与其相联结;例如:
select * from employees where department_id in (select department_id from departments)
就是嵌套的查询块,不过它们的目的都是去探索如果改变查询写法会不会提供更好的查询计划。
这种查询转换的步骤对于执行用户可以说是完全透明的,要知道转换器可能会在不改变查询结果集的情况下完全改写你的sql语句结构,因此我们有必要重新评估自己的查询语句的心理预期,尽管这种转换通常来说都是好事,为了获得更好更高效的执行计划。
我们现在来讨论一下几种基本的转换:
1.视图合并
2.子查询解嵌套
3.谓语前推
4.物化视图查询重写
一、视图合并
这种方式比较容易理解,它会将内嵌的视图展开成一个独立处理的查询块,或者将其与查询剩余部分合并成一个总的执行计划,转换后的语句基本上不包含视图了。
视图合并通常发生在当外部查询块的谓语包括:
1,能够在另一个查询块的索引中使用的列
2,能够在另一个查询块的分区截断中所使用的列
3,在一个联结视图能够限制返回行数的条件
在这种查询器的转换下,视图并不总会有自己的子查询计划,它会被预先分析并通常情况下与查询的其他部分合并以获得性能的提升,如下例。
sql> set autotrace traceonly explain
-- 进行视图合并
sql> select * from employees a,
2 (select department_id from employees) b_view
3 where a.department_id = b_view.department_id(+)
4 and a.salary > 3000;
execution plan
----------------------------------------------------------
plan hash value: 1634680537
----------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
----------------------------------------------------------------------------------------
| 0 | select statement | | 3161 | 222k| 3 (0)| 00:00:01 |
| 1 | nested loops outer| | 3161 | 222k| 3 (0)| 00:00:01 |
|* 2 | table access full| employees | 103 | 7107 | 3 (0)| 00:00:01 |
|* 3 | index range scan | emp_department_ix | 31 | 93 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
2 - filter("a"."salary">3000)
3 - access("a"."department_id"="department_id"(+))
-- 使用no_merge防止视图被重写
sql> select * from employees a,
2 (select /*+ no_merge */department_id from employees) b_view
3 where a.department_id = b_view.department_id(+)
4 and a.salary > 3000;
execution plan
----------------------------------------------------------
plan hash value: 1526679670
-----------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
-----------------------------------------------------------------------------------
| 0 | select statement | | 3161 | 253k| 7 (15)| 00:00:01 |
|* 1 | hash join right outer| | 3161 | 253k| 7 (15)| 00:00:01 |
| 2 | view | | 107 | 1391 | 3 (0)| 00:00:01 |
| 3 | table access full | employees | 107 | 321 | 3 (0)| 00:00:01 |
|* 4 | table access full | employees | 103 | 7107 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access("a"."department_id"="b_view"."department_id"(+))
4 - filter("a"."salary">3000)
出于某些情况,视图合并会被禁止或限制,如果在一个查询块中使用了分析函数,聚合函数,,集合运算(如union,intersect,minux),order by子句,以及rownum中的任何一种,这种情况都会发生;尽管如此,我们仍然可以使用/*+ merge(v) */提示来强制使用视图合并,不过前提一定要保证返回的结果集是一致的!!!如下例:
sql> set autotrace on
-- 使用聚合函数avg导致视图合并失效
sql> select e1.last_name, e1.salary, v.avg_salary
2 from hr.employees e1,
3 (select department_id, avg(salary) avg_salary
4 from hr.employees e2
5 group by department_id) v
6 where e1.department_id = v.department_id and e1.salary > v.avg_salary;
execution plan
----------------------------------------------------------
plan hash value: 2695105989
----------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
----------------------------------------------------------------------------------
| 0 | select statement | | 17 | 697 | 8 (25)| 00:00:01 |
|* 1 | hash join | | 17 | 697 | 8 (25)| 00:00:01 |
| 2 | view | | 11 | 286 | 4 (25)| 00:00:01 |
| 3 | hash group by | | 11 | 77 | 4 (25)| 00:00:01 |
| 4 | table access full| employees | 107 | 749 | 3 (0)| 00:00:01 |
| 5 | table access full | employees | 107 | 1605 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access("e1"."department_id"="v"."department_id")
filter("e1"."salary">"v"."avg_salary")
--使用/*+ merge(v) */强制进行视图合并
sql> select /*+ merge(v) */ e1.last_name, e1.salary, v.avg_salary
2 from hr.employees e1,
3 (select department_id, avg(salary) avg_salary
4 from hr.employees e2
5 group by department_id) v
6 where e1.department_id = v.department_id and e1.salary > v.avg_salary;
execution plan
----------------------------------------------------------
plan hash value: 3553954154
----------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
----------------------------------------------------------------------------------
| 0 | select statement | | 165 | 5610 | 8 (25)| 00:00:01 |
|* 1 | filter | | | | | |
| 2 | hash group by | | 165 | 5610 | 8 (25)| 00:00:01 |
|* 3 | hash join | | 3296 | 109k| 7 (15)| 00:00:01 |
| 4 | table access full| employees | 107 | 2889 | 3 (0)| 00:00:01 |
| 5 | table access full| employees | 107 | 749 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
二、子查询解嵌套
最典型的就是子查询转变为表连接了,它和视图合并的主要区别就在于它的子查询位于where子句,由转换器进行解嵌套的检测。
下面便是一个子查询==>表连接的例子:
sql> select employee_id, last_name, salary, department_id
2 from hr.employees
3 where department_id in
4 (select department_id
5 from hr.departments where location_id > 1700);
execution plan
----------------------------------------------------------
plan hash value: 432925905
---------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
---------------------------------------------------------------------------------------------------
| 0 | select statement | | 34 | 884 | 4 (0)| 00:00:01 |
| 1 | nested loops | | | | | |
| 2 | nested loops | | 34 | 884 | 4 (0)| 00:00:01 |
| 3 | table access by index rowid| departments | 4 | 28 | 2 (0)| 00:00:01 |
|* 4 | index range scan | dept_location_ix | 4 | | 1 (0)| 00:00:01 |
|* 5 | index range scan | emp_department_ix | 10 | | 0 (0)| 00:00:01 |
| 6 | table access by index rowid | employees | 10 | 190 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
4 - access("location_id">1700)
5 - access("department_id"="department_id")
-- 使用/*+ no_unnest */强制为子查询单独生成执行计划
sql> select employee_id, last_name, salary, department_id
2 from hr.employees
3 where department_id in
4 (select /*+ no_unnest */department_id
5 from hr.departments where location_id > 1700);
execution plan
----------------------------------------------------------
plan hash value: 4233807898
--------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------------------
| 0 | select statement | | 10 | 190 | 14 (0)| 00:00:01 |
|* 1 | filter | | | | | |
| 2 | table access full | employees | 107 | 2033 | 3 (0)| 00:00:01 |
|* 3 | table access by index rowid| departments | 1 | 7 | 1 (0)| 00:00:01 |
|* 4 | index unique scan | dept_id_pk | 1 | | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - filter( exists (select /*+ no_unnest */ 0 from "hr"."departments"
"departments" where "department_id"=:b1 and "location_id">1700))
3 - filter("location_id">1700)
4 - access("department_id"=:b1)
可以看到没有执行子查询解嵌套的查询只使用了filter来进行两张表的匹配,谓语信息第一步的查询也没有丝毫的改动,这便意味着对于employees表中返回的107行的每一行,都需要执行一次子查询。虽然在oracle中存在子查询缓存的优化,我们无法判断这两种计划的优劣,不过相比nested loops,filter运算的劣势是很明显的。
如果包含相关子查询,解嵌套过程一般会将相关子查询转换成一个非嵌套视图,然后与主查询中的表x相联结,如:
sql> select outer.employee_id, outer.last_name, outer.salary, outer.department_id
2 from hr.employees outer
3 where outer.salary >
4 (select avg(inner.salary)
5 from hr.employees inner
6 where inner.department_id = outer.department_id);
execution plan
----------------------------------------------------------
plan hash value: 2167610409
----------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
----------------------------------------------------------------------------------
| 0 | select statement | | 17 | 765 | 8 (25)| 00:00:01 |
|* 1 | hash join | | 17 | 765 | 8 (25)| 00:00:01 |
| 2 | view | vw_sq_1 | 11 | 286 | 4 (25)| 00:00:01 |
| 3 | hash group by | | 11 | 77 | 4 (25)| 00:00:01 |
| 4 | table access full| employees | 107 | 749 | 3 (0)| 00:00:01 |
| 5 | table access full | employees | 107 | 2033 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access("item_1"="outer"."department_id")
filter("outer"."salary">"avg(inner.salary)")
上面的查询是将子查询转换成视图在与主查询进行hash join,转换后的查询其实像这样:
sql> select outer.employee_id, outer.last_name, outer.salary, outer.department_id
2 from hr.employees outer,
3 (select department_id,avg(salary) avg_sal from hr.employees group by department_id) inner
4 where inner.department_id = outer.department_id and outer.salary > inner.avg_sal;
其实这两个语句的执行计划也是一致
三、谓语前推
将谓词从内部查询块推进到一个不可合并的查询块中,这样可以使得谓词条件更早的被选择,更早的过滤掉不需要的数据行,提高效率,同样可以使用这种方式允许某些索引的使用。
-- 谓语前推示例
sql> set autotrace traceonly explain
sql> select e1.last_name, e1.salary, v.avg_salary
2 from hr.employees e1,
3 (select department_id, avg(salary) avg_salary
4 from hr.employees e2
5 group by department_id) v
6 where e1.department_id = v.department_id
7 and e1.salary > v.avg_salary
8 and e1.department_id = 60;
execution plan
----------------------------------------------------------
plan hash value: 3521487559
-----------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
-----------------------------------------------------------------------------------------------------
| 0 | select statement | | 1 | 41 | 3 (0)| 00:00:01 |
| 1 | nested loops | | | | | |
| 2 | nested loops | | 1 | 41 | 3 (0)| 00:00:01 |
| 3 | view | | 1 | 26 | 2 (0)| 00:00:01 |
| 4 | hash group by | | 1 | 7 | 2 (0)| 00:00:01 |
| 5 | table access by index rowid| employees | 5 | 35 | 2 (0)| 00:00:01 |
|* 6 | index range scan | emp_department_ix | 5 | | 1 (0)| 00:00:01 |
|* 7 | index range scan | emp_department_ix | 5 | | 0 (0)| 00:00:01 |
|* 8 | table access by index rowid | employees | 1 | 15 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
6 - access("department_id"=60)
7 - access("e1"."department_id"=60)
8 - filter("e1"."salary">"v"."avg_salary")
-- 不进行谓语前推
sql> select e1.last_name, e1.salary, v.avg_salary
2 from hr.employees e1,
3 (select department_id, avg(salary) avg_salary
4 from hr.employees e2
5 where rownum > 1 -- rownum等于同时使用了no_merge和no_push_pred提示,这会同时禁用视图合并和谓语前推
6 group by department_id) v
7 where e1.department_id = v.department_id
8 and e1.salary > v.avg_salary
9 and e1.department_id = 60;
execution plan
----------------------------------------------------------
plan hash value: 3834222907
--------------------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------------------------
| 0 | select statement | | 3 | 123 | 7 (29)| 00:00:01 |
|* 1 | hash join | | 3 | 123 | 7 (29)| 00:00:01 |
| 2 | table access by index rowid| employees | 5 | 75 | 2 (0)| 00:00:01 |
|* 3 | index range scan | emp_department_ix | 5 | | 1 (0)| 00:00:01 |
|* 4 | view | | 11 | 286 | 4 (25)| 00:00:01 |
| 5 | hash group by | | 11 | 77 | 4 (25)| 00:00:01 |
| 6 | count | | | | | |
|* 7 | filter | | | | | |
| 8 | table access full | employees | 107 | 749 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
predicate information (identified by operation id):
---------------------------------------------------
1 - access("e1"."department_id"="v"."department_id")
filter("e1"."salary">"v"."avg_salary")
3 - access("e1"."department_id"=60)
4 - filter("v"."department_id"=60)
7 - filter(rownum>1)
比较上面的两个查询可以看到,在第一个查询中,department_id=60谓词被推进到视图v中执行了,这样就使得内部视图查询只需要获得部门号为60的平均薪水就可以了;而在第二个查询中则需要计算每个部门的平均薪水,然后在与外部查询联结的时候使用department_id=60条件过滤,相对而言这里为了等待应用谓词条件,查询做了更多的工作。
四、使用物化视图进行查询重写
当为物化视图开启查询重写功能时,cbo优化器会评估相应查询对基表与物化视图的访问成本,如果优化器认为该查询结果从物化视图中获得会更高效,那么就会其自动选择为物化视图来执行,否则则对基表生成查询计划。
还是来看栗子:
sql> set autotrace traceonly explain
sql> select department_id,count(employee_id) from employees group by department_id;
execution plan
----------------------------------------------------------
plan hash value: 1192169904
--------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
--------------------------------------------------------------------------------
| 0 | select statement | | 11 | 33 | 4 (25)| 00:00:01 |
| 1 | hash group by | | 11 | 33 | 4 (25)| 00:00:01 |
| 2 | table access full| employees | 107 | 321 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
-- 创建物化视图日志
sql> create materialized view log on employees with sequence,
2 rowid (employee_id,department_id) including new values;
materialized view log created.
-- 创建物化视图,并指定查询重写功能
sql> create materialized view mv_t
2 build immediate refresh fast on commit
3 enable query rewrite as
4 select department_id,count(employee_id) from employees group by department_id;
materialized view created.
sql> select department_id,count(employee_id) from employees group by department_id;
execution plan
----------------------------------------------------------
plan hash value: 1712400360
-------------------------------------------------------------------------------------
| id | operation | name | rows | bytes | cost (%cpu)| time |
-------------------------------------------------------------------------------------
| 0 | select statement | | 12 | 312 | 3 (0)| 00:00:01 |
| 1 | mat_view rewrite access full| mv_t | 12 | 312 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
note
-----
- dynamic sampling used for this statement (level=2)
可以看到在第二个查询中,虽然是指定的查询employees表,但是优化器自动选择了物化视图的执行路径,因为它判断出物化视图已经记载当前查询需要的结果集数据了,直接访问物化视图会获得更高的效率。
值得注意的是,这里的物化视图查询重写是自动发生的,同样也可以使用/*+ rewrite(mv_t) */提示的方式强制发生查询重写。
总结:
尽管优化器在用户透明的情况下改写了我们的查询结构,不过通常情况下这都是基于cbo优化模式下其判断较为高效的选择,这也是我们所期望的,同时为我们提供了一种学习方法,即在写sql语句的过程中时刻考虑优化器的作用。