欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

Oracle 执行计划(3)-两表连接基数

程序员文章站 2022-05-13 12:08:37
...

Oracle 执行计划(3)- 两表连接基数 1 公式: 基数 = 连接选择率 * 过滤条件 1 基数 + 过滤条件 2 的基数 连接选择率 =((num_rows( 表 1)-num_nulls( 表 1 连接字段 ))/num_rows( 表 1))* ((num_rows( 表 2)-num_nulls( 表 2 连接字段 ))/num_rows( 表 2))

Oracle 执行计划(3)-两表连接基数

1 公式:

基数= 连接选择率*过滤条件1基数+过滤条件2的基数

连接选择率=((num_rows(1)-num_nulls(1连接字段))/num_rows(1))*

((num_rows(2)-num_nulls(2连接字段))/num_rows(2)) /

Greater(num_distinct(1连接字段),num_distinct(2连接字段))

create table t1 as
select  trunc(dbms_random.value(0,25)) filter1,
trunc(dbms_random.value(0,30)) join1,
lpad(rownum,10) v1,
rpad('x',100) padding1
from all_objects
where rownum


已选择2259行。
已用时间:  00: 00: 00.03
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."JOIN1"="T2"."JOIN2")
   2 - filter("T2"."FILTER2"=2)
   3 - filter("T1"."FILTER"=1)
统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        504  consistent gets
          0  physical reads
          0  redo size
      60032  bytes sent via SQL*Net to client
       2035  bytes received via SQL*Net from client
        152  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2259  rows processed

select * from user_tab_col_statistics where table_name='T1'

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

T1

FILTER

25

0.04

0

T1

JOIN1

30

0.0333333333333333

0

T1

V1

10000

0.0001

0

T1

PADDING

1

1

0

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

SAMPLE_SIZE

T2

FILTER2

50

0.02

0

10000

T2

JOIN2

40

0.025

0

10000

T2

V2

10000

0.0001

0

10000

T2

PADDING2

1

1

0

10000

连接选择率=(10000-0)/10000)*(1000-0)/10000)/greater(30,40)=1/40

连接基数=1/40*(400*200)=2000

执行计划当中的 T2 ROWS=200,T1.ROWS=400 HASH JOIN.ROWS=2000
|*  1 |  HASH JOIN         |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |

2 包含空值情况下

Update t1 set join1=null where mod(to_number(v1),20)=0;

Update t2 set join2=null where mod(to_number(v2),30)=0;

SQL> analyze table t2 compute statistics;

SQL> analyze table t1 compute statistics;

select * from user_tab_col_statistics where table_name='T1'

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

SAMPLE_SIZE

T1

FILTER

25

0.04

0

10000

T1

JOIN1

30

0.0333333333333333

500

10000

T1

V1

10000

0.0001

0

10000

T1

PADDING

1

1

0

10000

TABLE_NAME

COLUMN_NAME

NUM_DISTINCT

DENSITY

NUM_NULLS

SAMPLE_SIZE

T2

FILTER2

50

0.02

0

10000

T2

JOIN2

40

0.025

333

10000

T2

V2

10000

0.0001

0

10000

T2

PADDING2

1

1

0

10000

套公式 选择率=((10000-500)/10000)*((10000-333)/10000)/greater(30,40)

=9500/10000*9667/10000/40

=0.95*0.9667/40

=0.022959125

基数=200*400*0.022959125 =1836.73

执行计划:

已选择2042行。

已用时间: 00: 00: 00.03

执行计划

----------------------------------------------------------

Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1837 | 51436 |    76   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  1837 | 51436 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |
---------------------------------------------------------------------------

3 过滤基数

基数=基本选择率*(num_rows-nulls)

Update t1 set filter=null where mod(to_number(v1),50)=0;

Update t2 set filter2=null where mod(to_number(v2),100)=0;

200 rows updated

100 rows updated

T1.filter cardinatitly=1/25*(10000-200)=392

T2.FILTER2 CARDINATILTY=1/50(10000-100)=198

连接基数=392*198*0.022959125=1781.995

已选择2000行。

已用时间: 00: 00: 00.06

执行计划

----------------------------------------------------------

Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1782 | 49896 |    76   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  1782 | 49896 |    76   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |   198 |  2772 |    38   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   |   392 |  5488 |    38   (3)| 00:00:01 |

4 多连接条件

select t3.v2,t4.v2
from t3,t4
where t3.join1=t4.join2
and t3.join2=t4.join2

连接公式:=(条件1选择率)*(条件2选择率)

不求证了!

5 范围连接选择率

1 Where t1.join1

2 Where t2.join1 between t1.join1-1 and t1.join1+1

1 选择率=5% 固定选择率

2 转化成绑定变量格式, 固定选择率相乘.5%*5%

6 不等连接选择率

Where t1.join1!=t2.join2

选择率 = 1-( t1.join1=t2.join2 选择率)

=1-1/40=39/40

7 AND OR 多连接条件

1 where t1.join1=t2.join1 and t1.join2=t2.join2

2 where t1.join1=t2.join1 OR t1.join2=t2.join2

可以参考单表基数的多谓词选择率

1 join1选择率*join2选择率

2 join1选择率+join2选择率- join1选择率*join2选择率

8 三表连接基数选择率

create table t3 as
select
trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(
0,30)) join1,
trunc(dbms_random.value(
0,50)) join2,
lpad(rownum,
10) v2,
rpad(
'x',100) padding2
from all_objects
where rownum10000;

然后重新生存 T1 T2 分析后执行语句

select t1.v1,t2.v2,t3.v2
from t1,t2,t3
where t1.join1=t2.join2
and t2.join2=t3.join1
and t1.filter1=
1
and t2.filter2=
1

1 先做T1T2的选择率和基数

前面已经获得2000

2 T2T3做连接

套用公式T2T3

选择率=(10000-0)/10000)*(10000-0)/10000)/greater(40,30)=1/40

基数=1/40*2000*10000=50,0000

注意 2000是第一个连接的基数,10000是T3无过滤条件的基数.

已用时间: 00: 00: 09.42

执行计划

----------------------------------------------------------

Plan hash value: 1184213596
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   500K|    19M|   123   (9)| 00:00:02 |
|*  1 |  HASH JOIN          |      |   500K|    19M|   123   (9)| 00:00:02 |
|*  2 |   HASH JOIN         |      |  2000 | 56000 |    76   (3)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T2   |   200 |  2800 |    38   (3)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |   400 |  5600 |    38   (3)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   | 10000 |   117K|    39   (3)| 00:00:01 |

----------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("T2"."JOIN2"="T3"."JOIN1")

2 - access("T1"."JOIN1"="T2"."JOIN2")

3 - filter("T2"."FILTER2"=1)

4 - filter("T1"."FILTER1"=1)

9 传递闭包

create table t4 as
select
trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(
0,40)) join1,
trunc(dbms_random.value(
0,40)) join2,
lpad(rownum,
10) v2,
rpad(
'x',100) padding2
from all_objects
where rownum10000;

select t3.v2,t4.v2
from t3,t4
where t3.join1=t4.join1
and t3.join2=t4.join2
and t3.join1=20;

传递闭包是这么回事因为T3.JOIN1=20 并且T3.JOIN1=T4.JOIN1 T4.JOIN1=20;

执行计划

----------------------------------------------------------

Plan hash value: 920528290
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    52 |  1456 |    78   (3)| 00:00:01 |
|*  1 |  HASH JOIN         |      |    52 |  1456 |    78   (3)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T4   |   250 |  3500 |    39   (3)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T3   |   333 |  4662 |    39   (3)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T3"."JOIN1"="T4"."JOIN1" AND "T3"."JOIN2"="T4"."JOIN2")

2 - filter("T4"."JOIN1"=20)

3 - filter("T3"."JOIN1"=20)

实际上结果集行数是:1554 与52基数相差超大

因为 JOIN1选择率*JOIN2选择率=(10000-0)/10000)*(10000-0)/10000)/greater(30,40)

*(10000-0)/10000)*(10000-0)/10000)/greater(50,40)=1/40*1/50=1/2000

并且因为10G多列完备性检查,选择结果集最小选择率相乘 1/40*1/40=1/1600

基数=1/1600*10000/30*10000/40=52

基数=1/40*10000/30*10000/50=1/40*333*200=1665 与结果集相当.因为该版本没有消除掉连接条件.