Oracle 执行计划(3)-两表连接基数
Oracle 执行计划(3)- 两表连接基数 1 公式: 基数 = 连接选择率 * 过滤条件 1 基数 + 过滤条件 2 的基数 连接选择率 =((num_rows( 表 1)-num_nulls( 表 1 连接字段 ))/num_rows( 表 1))* ((num_rows( 表 2)-num_nulls( 表 2 连接字段 ))/num_rows( 表 2))
Oracle 执行计划(3)-两表连接基数
1 公式:
基数= 连接选择率*过滤条件1基数+过滤条件2的基数
连接选择率=((num_rows(表1)-num_nulls(表1连接字段))/num_rows(表1))*
((num_rows(表2)-num_nulls(表2连接字段))/num_rows(表2)) /
Greater(num_distinct(表1连接字段),num_distinct(表2连接字段))
create table t1 as select trunc(dbms_random.value(0,25)) filter1, trunc(dbms_random.value(0,30)) join1, lpad(rownum,10) v1, rpad('x',100) padding1 from all_objects where rownum
已选择2259行。已用时间: 00: 00: 00.03执行计划----------------------------------------------------------Plan hash value: 2959412835---------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 2000 | 56000 | 76 (3)| 00:00:01 ||* 1 | HASH JOIN | | 2000 | 56000 | 76 (3)| 00:00:01 ||* 2 | TABLE ACCESS FULL| T2 | 200 | 2800 | 38 (3)| 00:00:01 ||* 3 | TABLE ACCESS FULL| T1 | 400 | 5600 | 38 (3)| 00:00:01 |---------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------1 - access("T1"."JOIN1"="T2"."JOIN2")2 - filter("T2"."FILTER2"=2)3 - filter("T1"."FILTER"=1)统计信息----------------------------------------------------------1 recursive calls0 db block gets504 consistent gets0 physical reads0 redo size60032 bytes sent via SQL*Net to client2035 bytes received via SQL*Net from client152 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)2259 rows processed
select * from user_tab_col_statistics where table_name='T1'
TABLE_NAME |
COLUMN_NAME |
NUM_DISTINCT |
DENSITY |
NUM_NULLS |
T1 |
FILTER |
25 |
0.04 |
0 |
T1 |
JOIN1 |
30 |
0.0333333333333333 |
0 |
T1 |
V1 |
10000 |
0.0001 |
0 |
T1 |
PADDING |
1 |
1 |
0 |
TABLE_NAME |
COLUMN_NAME |
NUM_DISTINCT |
DENSITY |
NUM_NULLS |
SAMPLE_SIZE |
T2 |
FILTER2 |
50 |
0.02 |
0 |
10000 |
T2 |
JOIN2 |
40 |
0.025 |
0 |
10000 |
T2 |
V2 |
10000 |
0.0001 |
0 |
10000 |
T2 |
PADDING2 |
1 |
1 |
0 |
10000 |
连接选择率=(10000-0)/10000)*(1000-0)/10000)/greater(30,40)=1/40
连接基数=1/40*(400*200)=2000
执行计划当中的 T2 ROWS=200,T1.ROWS=400 HASH JOIN.ROWS=2000
|* 1 | HASH JOIN | | 2000 | 56000 | 76 (3)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 200 | 2800 | 38 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 5600 | 38 (3)| 00:00:01 |
2 包含空值情况下
Update t1 set join1=null where mod(to_number(v1),20)=0;
Update t2 set join2=null where mod(to_number(v2),30)=0;
SQL> analyze table t2 compute statistics;
SQL> analyze table t1 compute statistics;
select * from user_tab_col_statistics where table_name='T1'
TABLE_NAME |
COLUMN_NAME |
NUM_DISTINCT |
DENSITY |
NUM_NULLS |
SAMPLE_SIZE |
T1 |
FILTER |
25 |
0.04 |
0 |
10000 |
T1 |
JOIN1 |
30 |
0.0333333333333333 |
500 |
10000 |
T1 |
V1 |
10000 |
0.0001 |
0 |
10000 |
T1 |
PADDING |
1 |
1 |
0 |
10000 |
TABLE_NAME |
COLUMN_NAME |
NUM_DISTINCT |
DENSITY |
NUM_NULLS |
SAMPLE_SIZE |
T2 |
FILTER2 |
50 |
0.02 |
0 |
10000 |
T2 |
JOIN2 |
40 |
0.025 |
333 |
10000 |
T2 |
V2 |
10000 |
0.0001 |
0 |
10000 |
T2 |
PADDING2 |
1 |
1 |
0 |
10000 |
套公式 选择率=((10000-500)/10000)*((10000-333)/10000)/greater(30,40)
=9500/10000*9667/10000/40
=0.95*0.9667/40
=0.022959125
基数=200*400*0.022959125 =1836.73
执行计划:
已选择2042行。
已用时间: 00: 00: 00.03
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1837 | 51436 | 76 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 1837 | 51436 | 76 (3)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 200 | 2800 | 38 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T1 | 400 | 5600 | 38 (3)| 00:00:01 |
---------------------------------------------------------------------------
3 过滤基数
基数=基本选择率*(num_rows-nulls)
Update t1 set filter=null where mod(to_number(v1),50)=0;
Update t2 set filter2=null where mod(to_number(v2),100)=0;
200 rows updated
100 rows updated
T1.filter cardinatitly=1/25*(10000-200)=392
T2.FILTER2 CARDINATILTY=1/50(10000-100)=198
连接基数=392*198*0.022959125=1781.995
已选择2000行。
已用时间: 00: 00: 00.06
执行计划
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1782 | 49896 | 76 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 1782 | 49896 | 76 (3)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 198 | 2772 | 38 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T1 | 392 | 5488 | 38 (3)| 00:00:01 |
4 多连接条件
select t3.v2,t4.v2
from t3,t4
where t3.join1=t4.join2
and t3.join2=t4.join2
连接公式:=(条件1选择率)*(条件2选择率)
不求证了!
5 范围连接选择率
1 Where t1.join1
2 Where t2.join1 between t1.join1-1 and t1.join1+1
1 选择率=5% 固定选择率
2 转化成绑定变量格式, 固定选择率相乘.5%*5%
6 不等连接选择率
Where t1.join1!=t2.join2
选择率 = 1-( t1.join1=t2.join2 选择率)
=1-1/40=39/40
7 AND OR 多连接条件
1 where t1.join1=t2.join1 and t1.join2=t2.join2
2 where t1.join1=t2.join1 OR t1.join2=t2.join2
可以参考单表基数的多谓词选择率
1 join1选择率*join2选择率
2 join1选择率+join2选择率- join1选择率*join2选择率
8 三表连接基数选择率
create table t3 as
select trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(0,30)) join1,
trunc(dbms_random.value(0,50)) join2,
lpad(rownum,10) v2,
rpad('x',100) padding2
from all_objects
where rownum10000;
然后重新生存 T1 和T2表 分析后执行语句
select t1.v1,t2.v2,t3.v2
from t1,t2,t3
where t1.join1=t2.join2
and t2.join2=t3.join1
and t1.filter1=1
and t2.filter2=1
1 先做T1和T2的选择率和基数
前面已经获得2000
2 T2和T3做连接
套用公式T2和T3
选择率=(10000-0)/10000)*(10000-0)/10000)/greater(40,30)=1/40
基数=1/40*2000*10000=50,0000
注意 2000是第一个连接的基数,10000是T3无过滤条件的基数.
已用时间: 00: 00: 09.42
执行计划
----------------------------------------------------------
Plan hash value: 1184213596
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500K| 19M| 123 (9)| 00:00:02 |
|* 1 | HASH JOIN | | 500K| 19M| 123 (9)| 00:00:02 |
|* 2 | HASH JOIN | | 2000 | 56000 | 76 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T2 | 200 | 2800 | 38 (3)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| T1 | 400 | 5600 | 38 (3)| 00:00:01 |
| 5 | TABLE ACCESS FULL | T3 | 10000 | 117K| 39 (3)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JOIN2"="T3"."JOIN1")
2 - access("T1"."JOIN1"="T2"."JOIN2")
3 - filter("T2"."FILTER2"=1)
4 - filter("T1"."FILTER1"=1)
9 传递闭包
create table t4 as
select trunc(dbms_random.value(0,50)) filter2,
trunc(dbms_random.value(0,40)) join1,
trunc(dbms_random.value(0,40)) join2,
lpad(rownum,10) v2,
rpad('x',100) padding2
from all_objects
where rownum10000;
select t3.v2,t4.v2
from t3,t4
where t3.join1=t4.join1
and t3.join2=t4.join2
and t3.join1=20;
传递闭包是这么回事因为T3.JOIN1=20 并且T3.JOIN1=T4.JOIN1 则 T4.JOIN1=20;
执行计划
----------------------------------------------------------
Plan hash value: 920528290
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52 | 1456 | 78 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 52 | 1456 | 78 (3)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T4 | 250 | 3500 | 39 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T3 | 333 | 4662 | 39 (3)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T3"."JOIN1"="T4"."JOIN1" AND "T3"."JOIN2"="T4"."JOIN2")
2 - filter("T4"."JOIN1"=20)
3 - filter("T3"."JOIN1"=20)
实际上结果集行数是:1554 与52基数相差超大
因为 JOIN1选择率*JOIN2选择率=(10000-0)/10000)*(10000-0)/10000)/greater(30,40)
*(10000-0)/10000)*(10000-0)/10000)/greater(50,40)=1/40*1/50=1/2000
并且因为10G多列完备性检查,选择结果集最小选择率相乘 1/40*1/40=1/1600
基数=1/1600*10000/30*10000/40=52
基数=1/40*10000/30*10000/50=1/40*333*200=1665 与结果集相当.因为该版本没有消除掉连接条件.