欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

Oracle 中的动态采样(dynamic sampling)相关示例

程序员文章站 2022-06-03 23:06:27
...

1.1首先回顾下动态采样(dynamic sampling)的相关知识点 这个特性,使数据库随机的扫描表中少量的block,用来增强数据库的统计信息。 1.1.1 目的 动态采样增加了那些丢失的或者不足的优化器统计信息。使用动态采样可以让优化器更好的选择谓词。动态采样能够

1.1首先回顾下动态采样(dynamic sampling)的相关知识点

这个特性,使数据库随机的扫描表中少量的block,用来增强数据库的统计信息。

1.1.1 目的

动态采样增加了那些丢失的或者不足的优化器统计信息。使用动态采样可以让优化器更好的选择谓词。动态采样能够补充类似表中block个数,相关的索引block个数,表的集势(ronded个数),相关的连接列的统计信息(提供extendedstatistics的功能)。

1.1.2 动态采样概念

动态采样默认为启动状态,可以设置OPTIMIZER_DYNAMIC_SAMPLING=0来禁用掉这一特性。

OPTIMIZER_DYNAMIC_SAMPLING也是和动态采样最重要的参数,它控制着动态采样级别。

1.1.3 OPTIMIZER_DYNAMIC_SAMPLING动态参数说明

OPTIMIZER_DYNAMIC_SAMPLING

Property

Description

Parameter type

Integer

Default value

If OPTIMIZER_FEATURES_ENABLE is set to 10.0.0 or higher, then 2

If OPTIMIZER_FEATURES_ENABLE is set to 9.2.0, then 1

If OPTIMIZER_FEATURES_ENABLE is set to 9.0.1 or lower, then 0

Modifiable

ALTER SESSION, ALTER SYSTEM

Range of values

0 to 10

Dynamic Sampling Levels

Oracle 中的动态采样(dynamic sampling)相关示例

10g以上oracle database 的OPTIMIZER_DYNAMIC_SAMPLING参数默认值为2。

1.2 应用场景

下面列举3个动态采样的典型应用场景

1.2.1 缺失统计信息

当查询中的一个或者多个表没有统计信息,那么优化器就会收集关于表的基本信息用来执行优化操作。

1.创建表

dexter@DAVID> create table tuning8_tab1nologging as

2 select level as id , 'name'|| level as name

3 from dual

4 connect by level

Table created.

2.创建索引

_dexter@DAVID> create indexidx_tuning8_tab1_id on tuning8_tab1(id) ;

Index created.

3.查询测试

先看一下参数的设置

_dexter@DAVID> show parameter samp

NAME TYPE VALUE

----------------------------------------------- ------------------------------

optimizer_dynamic_sampling integer 2

_dexter@DAVID> select * from tuning8_tab1where id=2 ;

ID NAME

---------- --------------------------------------------

2 name2

Execution Plan

----------------------------------------------------------

Plan hash value: 3712969662

---------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

---------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 37 | 2 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB1 | 1 | 37 | 2 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | IDX_TUNING8_TAB1_ID | 1 | | 1 (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -access("ID"=2)

Note

-----

-dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

4 consistent gets

3 physical reads

0 redo size

596 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

动态采样已经起作用了。

1.2.2 缺失extendedstatistics

oracle提供相关性统计信息,这里简单演示了一种情况,说明动态采样可以使oracle数据库更加智能的选择查询计划,进而提升性能。(其实使用dbms_stats设置特定的参数一样可以达到这种目的)

1. 基于all_objects视图创建表

create table tect9_tab2 nologging

as select decode( mod(rownum,2), 0, 'D', 'R' )col1,

decode( mod(rownum,2), 0, 'R', 'D' ) col2, t.*

from all_objects t

2. 创建复合索引

create index idx_comp_tect9_tab2_col1_col2 ontect9_tab2(col1,col2) ;

3. 收集统计信息

execdbms_stats.gather_table_stats(user,'tect9_tab2',method_opt=>'for all indexedcolumns size 254' );

dexter@STARTREK> execdbms_stats.gather_table_stats(user,'tect9_tab2',method_opt=>'for all indexedcolumns size 254' );

PL/SQL procedure successfully completed.

4. 查询测试

数据总量为72566

dexter@STARTREK> select count(*) fromtect9_tab2 ;

COUNT(*)

----------

72566

已经收集了统计信息,所以常规的查询都可以得到正确的执行计划。

dexter@STARTREK> select * from tect9_tab2where col1='D' ;

Execution Plan

----------------------------------------------------------

Plan hash value: 3262335044

--------------------------------------------------------------------------------

| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------

| 0 |SELECT STATEMENT | | 35484 | 3499K| 302 (1)| 00:00:01 |

|* 1| TABLE ACCESS FULL| TECT9_TAB2 | 35484| 3499K| 302 (1)| 00:00:01 |

--------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

1 -filter("COL1"='D')

这里也是正确的

dexter@STARTREK> select * from tect9_tab2where col1='D' and col2='R' ;

Execution Plan

----------------------------------------------------------

Plan hash value: 3262335044

--------------------------------------------------------------------------------

| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------

| 0 |SELECT STATEMENT | | 17348 | 1711K| 301 (1)| 00:00:01 |

|* 1| TABLE ACCESS FULL| TECT9_TAB2 | 17348| 1711K| 301 (1)| 00:00:01 |

--------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

1 -filter("COL2"='R' AND "COL1"='D')

这里按照常理来说是正确的,但是在本环境中,oracle其实有更智能的一面:

因为col1和col2 之间有关联,col1=’D’的时候col2=’D’的记录是没有的,但是这里预计的条数为18133条,预计错误,oracle是否有更智能的选择呢?

dexter@STARTREK> select * from tect9_tab2where col1='D' and col2='D' ;

Execution Plan

----------------------------------------------------------

Plan hash value: 3262335044

--------------------------------------------------------------------------------

| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------

| 0 |SELECT STATEMENT | | 18133 | 1788K| 302 (1)| 00:00:01 |

|* 1| TABLE ACCESS FULL| TECT9_TAB2 | 18133| 1788K| 302 (1)| 00:00:01 |

--------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

1 -filter("COL1"='D' AND "COL2"='D')

5. using Dynamic Samping

可以看到,使用动态采样,虽然预计得到的条数为但是已经比较接近真实环境了,这里选择索引扫描作为access path。

select /*+ dynamic_sampling(t 2) */ * fromtect9_tab2 t where col1='D' and col2='D' ;

dexter@STARTREK> select /*+dynamic_sampling(t 2) */ * from tect9_tab2 t where col1='D' and col2='D' ;

Execution Plan

----------------------------------------------------------

Plan hash value: 1707542874

-------------------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 5 | 505 | 2 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID| TECT9_TAB2 | 5 | 505 | 2 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN |IDX_COMP_TECT9_TAB2_COL1_COL2 | 5 | | 1 (0)| 00:00:01 |

-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -access("COL1"='D' AND "COL2"='D')

Note

-----

-dynamic sampling used for this statement (level=2)

1.2.3 对于一个经常大规模被修改的表使用动态采样

如果一个表中的数据,经常执行大批量的加载或者更新,那么在每一次加载或者更新操作后都需要手动的执行收集统计信息的操作。

因为如果不手动的收集,因为统计信息过旧,优化器可能会选择效率较差的access path,影响效率。

其实还有另外一种做法可以避免这样的事情发生,那就是使用动态采样特性。

当一个表没有统计信息的时候,oracle数据库就会使用动态采样来协助优化器。所以对于上述的环境中,就可以删除、并且锁定表的统计信息。这样每次执行查询操作的时候,动态采样的特性都会被使用。

下面给出示例:

1.创建表

_dexter@DAVID> create table tuning8_tab3nologging

2 as

3 select level as id , 'name'||level as name

4 from dual

5 connect by level

Table created.

2.创建索引

_dexter@DAVID> create indexidx_tuning8_tab3_id on tuning8_tab3(id) ;

Index created.

3.收集统计信息

_dexter@DAVID> @gather_tab

Enter value for tbname: tuning8_tab3

PL/SQL procedure successfully completed.

4.查询测试

_dexter@DAVID>

_dexter@DAVID>

_dexter@DAVID> set autotrace on

_dexter@DAVID> select * from tuning8_tab3where id=1 ;

ID NAME

------------------------------------------------------

1 name1

Execution Plan

----------------------------------------------------------

Plan hash value: 1647674320

---------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

---------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |

| 1 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB3 | 1 | 13 | 2 (0)| 00:00:01 |

|* 2 | INDEX RANGE SCAN | IDX_TUNING8_TAB3_ID | 1 | | 1 (0)| 00:00:01 |

---------------------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -access("ID"=1)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

4 consistent gets

0 physical reads

0 redo size

596 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

非常正常,优化器选择了最优的accesspath 。

5.加载数据

_dexter@DAVID> insert into tuning8_tab3select 10 , 'namess' from dual connect by level

50000 rows created.

_dexter@DAVID> commit ;

Commit complete.

6.查询测试

可以看到,优化器选择了索引扫描作为accesspath。

_dexter@DAVID> select count(name) fromtuning8_tab3 where id=10 ;

COUNT(NAME)

-----------

50001

Execution Plan

----------------------------------------------------------

Plan hash value: 2464990924

----------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 13 | | |

| 2 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB3 | 1 | 13 | 2 (0)| 00:00:01 |

|* 3 | INDEX RANGE SCAN | IDX_TUNING8_TAB3_ID | 1 | | 1 (0)| 00:00:01 |

----------------------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

3 -access("ID"=10)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

414 consistent gets

0 physical reads

11588 redo size

531 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

但是其实全表扫描才是最优的

_dexter@DAVID> select /*+full(tuning8_tab3) */ count(name) from tuning8_tab3 where id=10 ;

COUNT(NAME)

-----------

50001

Execution Plan

----------------------------------------------------------

Plan hash value: 926892911

-----------------------------------------------------------------------------------

| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 13 | 10 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 13 | | |

|* 2 | TABLE ACCESS FULL| TUNING8_TAB3 | 1 | 13 | 10 (0)| 00:00:01 |

-----------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -filter("ID"=10)

Statistics

----------------------------------------------------------

1 recursive calls

1 db block gets

204 consistent gets

0 physical reads

176 redo size

531 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

7.解决方法

7.1每次加载后收集统计信息

问题已经出现了,当然在每次大规模加载数据后,可以收集一次表的统计信息。

_dexter@DAVID> @gather_tab

Enter value for tbname: tuning8_tab3

PL/SQL procedure successfully completed.

_dexter@DAVID> select count(name) fromtuning8_tab3 where id=10 ;

COUNT(NAME)

-----------

50001

Execution Plan

----------------------------------------------------------

Plan hash value: 926892911

-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 10 | 69 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 10 | | |

|* 2 | TABLE ACCESS FULL| TUNING8_TAB3 | 50079| 489K| 69 (0)| 00:00:01 |

-----------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -filter("ID"=10)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

268 consistent gets

0 physical reads

0 redo size

531 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

7.2.使用动态采样

当然还有另外一种方法,那就是今天要说的,使用动态采样的特性。

BEGIN

DBMS_STATS.DELETE_TABLE_STATS(user,'tuning8_tab3');

DBMS_STATS.LOCK_TABLE_STATS(user,'tuning8_tab3');

END;

/

删除统计信息,并且锁定表的统计信息

_dexter@DAVID> BEGIN

2 DBMS_STATS.DELETE_TABLE_STATS(user,'tuning8_tab3');

DBMS_STATS.LOCK_TABLE_STATS(user,'tuning8_tab3');

4 END;

5 /

PL/SQL procedure successfully completed.

这样优化器通常情况下都会产生较正确的查询计划。

_dexter@DAVID> select count(name) fromtuning8_tab3 where id=10 ;

COUNT(NAME)

-----------

50001

Execution Plan

----------------------------------------------------------

Plan hash value: 926892911

-----------------------------------------------------------------------------------

| Id |Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 37 | 70 (2)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 37 | | |

|* 2 | TABLE ACCESS FULL| TUNING8_TAB3 | 57162| 2065K| 70 (2)| 00:00:01 |

-----------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -filter("ID"=10)

Note

-----

-dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

52 recursive calls

0 db block gets

278 consistent gets

0 physical reads

0 redo size

531 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

8.再次加载数据测试

就算再次加载数据。

_dexter@DAVID> insert into tuning8_tab3select 9 , 'namess' from dual connect by level

100000 rows created.

_dexter@DAVID> commit ;

Commit complete.

依然会产生最优的查询计划,保证了效率。

_dexter@DAVID> select count(name) fromtuning8_tab3 where id=9 ;

COUNT(NAME)

-----------

100001

Execution Plan

----------------------------------------------------------

Plan hash value: 926892911

-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 37 | 105 (2)| 00:00:02 |

| 1 | SORT AGGREGATE | | 1 | 37 | | |

|* 2 | TABLE ACCESS FULL|TUNING8_TAB3 | 135K| 4879K| 105 (2)| 00:00:02 |

-----------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -filter("ID"=9)

Note

-----

-dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

8 recursive calls

1 db block gets

498 consistent gets

0 physical reads

880 redo size

531 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

可以看到,全表扫描的效率明显低于优化器选择的索引扫描的效率。

_dexter@DAVID> select /*+index(tuning8_tab3 idx_tuning8_tab3_id)*/ count(name) from tuning8_tab3 where id=9 ;

COUNT(NAME)

-----------

100001

Execution Plan

----------------------------------------------------------

Plan hash value: 2464990924

----------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 37 | 826 (0)| 00:00:10 |

| 1 | SORT AGGREGATE | | 1 | 37 | | |

| 2 | TABLE ACCESS BY INDEX ROWID|TUNING8_TAB3 | 135K| 4879K| 826 (0)| 00:00:10 |

|* 3 | INDEX RANGE SCAN | IDX_TUNING8_TAB3_ID | 135K| | 363 (0)| 00:00:05 |

----------------------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

3 -access("ID"=9)

Note

-----

-dynamic sampling used for this statement (level=2)

Statistics

----------------------------------------------------------

7 recursive calls

0 db block gets

972 consistent gets

0 physical reads

25252 redo size

531 bytes sent via SQL*Net to client

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

当然对于一些对查询计划要求比较高的地方,不建议使用这种方法,特别是olap系统,因为动态采样只是一个折中的收集统计信息的方法,并不像使用dbms_stats包那样准确,也有可能生成错误的执行计划,就像下面的1.3示例一样。

1.3 表分析与动态采样

表分析(使用dbms_Stats)可以提供较高品质的统计信息,而动态采样则是一个折中方案,只能提供粗粒度的统计信息,所以对于一些对统计信息要求较高的查询中,因为统计信息缺失而使用动态采样是无法保证优化器选择最优的执行计划。

如下例所示:

1.初始化10w数据tuning8_tab2

_dexter@DAVID> create table tuning8_tab2nologging as

2 select level as id , 'name'|| level asname

3 from dual

4 connect by level

Table created.

2.创建索引

_dexter@DAVID> create indexidx_tuning8_tab2_id on tuning8_tab2 (id) ;

Index created.

3.更新数据,总数10w,id=10的数据3w条

_dexter@DAVID> update tuning8_tab2 setid=10 where id > 40000 and id

29999 rows updated.

_dexter@DAVID> commit ;

Commit complete.

4.查询测试

默认没有统计信息,使用动态采样,此时选择的是全表扫描,consistantgets=356

但是其实最优的access path 应该是索引扫描。

_dexter@DAVID> select count(*) fromtuning8_tab2 where id=10 ;

COUNT(*)

----------

30000

Execution Plan

----------------------------------------------------------

Plan hash value: 2380771210

-----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

-----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 13 | 81 (2)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 13 | | |

|* 2 | TABLE ACCESS FULL| TUNING8_TAB2 | 33120 | 420K| 81 (2)| 00:00:01 |

-----------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -filter("ID"=10)

Note

-----

- dynamic samplingused for this statement (level=2)

Statistics

----------------------------------------------------------

7 recursive calls

0 db block gets

356 consistent gets

0 physical reads

0 redo size

526 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

5.收集统计信息

_dexter@DAVID> @gather_tab

Enter value for tbname: tuning8_tab2

PL/SQL procedure successfully completed.

再次查询,此时使用的是索引快速扫描,consistent gets297

_dexter@DAVID> select count(*) fromtuning8_tab2 where id=10 ;

COUNT(*)

----------

30000

Execution Plan

----------------------------------------------------------

Plan hash value: 2673526969

---------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

---------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 5 | 73 (2)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 5 | | |

|* 2 | INDEX FAST FULL SCAN|IDX_TUNING8_TAB2_ID | 29921 | 146K| 73 (2)| 00:00:01 |

---------------------------------------------------------------------------------------------

Predicate Information (identified by operationid):

---------------------------------------------------

2 -filter("ID"=10)

Statistics

----------------------------------------------------------

0 recursive calls

0 db block gets

297 consistent gets

0 physical reads

0 redo size

526 bytes sent via SQL*Net toclient

523 bytes received via SQL*Netfrom client

2 SQL*Net roundtrips to/fromclient

0 sorts (memory)

0 sorts (disk)

1 rows processed

上面只是一个比较粗劣的实验,查询也比较简单。对于像olap这种对查询计划要求比较高,查询计划非常复杂的数据库,尽量保持统计信息的实时性是非常有必要的。