Oracle并行进程小结
将一个任务拆分成多个小任务同时处理,发起该sql的服务器进程成为query coordinator进程,负责协调调度slave processes并将其结果
原理:
将一个任务拆分成多个小任务同时处理,发起该sql的服务器进程成为query coordinator进程,负责协调调度slave processes并将其结果集整合返回给客户端;
并行操作的granule有两种:partition granule和block range granule,后者是sql运行时动态定义的,一般更能平均的在salve processes之间分配,而并行处理的速度是由最慢的那个slave process决定的;
当一条sql执行多个操作时,例如扫描和排序,则会分配多组slave processes;
单个操作的并行化称为intra-operation parallelism,而多组slave processes之间的交互则为inter-operation parallelism,后者则导致各组之间出现通讯 ;
发送数据的进程为producer,而接受的进程则为consumer,producer通过SGA中的table queue来给consumer发送数据,每对producer-consumer都对应一个table queue;
两者可采用以下方式进行数据通讯:
广播—每个producer发送数据给所有的consumer
循环—producer采用轮循的方式给所有consumer发送记录
范围—producer将指定范围的记录发给特定的consumer
Hash—producer利用hash函数 决定接受数据的consumer
QC Random—每个producer都将记录随机发给coordinator,顺序不重要
QC order--每个producer都将记录按序发给coordinator
在并行操作的执行计划里可以看到如下的操作:
P -> S ---并行发送数据到串行,例如,每个执行计划中最后向coordinator进程发送数据都是采用这种方式
P -> P ---一个并行操作发送数据给另一个并行操作,当存在多组并行进程时会用到
S -> P ---串行发送数据给并行,此操作效率比较差,应该尽量避免
PCWP---并行与父操作合并,此为同一组内的进程交互,因此没有组间通讯
PCWC---并行与子操作合并,也为同组进程交互,没有组间通讯
参数配置
每个instance能够使用的并行进程数量是有限的,instance会维护一个slave process pool,类似连接池,每次coordinator会从中请求slave process执行完成后再将其返还;
parallel_min_servers:指定instance启动时创建的slave process数量,默认为0;通常只有在sql花费过多时间创建slave process时 才修改此值,此操作相关的等待事件为os thread startup;
parallel_max_servers:指定slave process的最大可用数量;
parallel_execution_message_size:前面提到的table queue存在于large pool(专门存放不可重用的数据结构)中,每个table queue有3-4个缓冲区组成,该参数用来定义该缓冲区大小;
parallel_automatic_tuning:10g开始已经不推荐使用了,设置为true时Oracle会使用large pool处理table queue
parallel_min_percent:默认为0,可设置成0-100,为0时表明oracle将尽可能的提供足够多的slave process,如果可分配的数量小于2则进行串行操作;若设置成非0值,则至少为sql提供指定比例的slave process数量,否则会报ora-12827,例如parallel_min_percent=25且有sql请求16个子进程,则至少提供16*25/100=4个,否则ora-12827;
parallel_adaptive_multi_user:10g默认为true;为false时,只要还有子进程,sql请求多少就分配给多少;true则oracle会根据实际情形下调sql的并行度以保证slave process不会被耗尽;
如何使用parallel
有三种方式:
1 指定table/index的并行度
2使用parallel提示
3在session级别enable parallel query/DML/DDL,其中query是默认开启的
三者关系为hint优先级最高,而force parallel又可以覆盖表或索引级别定义的并行度;而要想在instance彻底禁止并行操作,可将parallel_max_servers设置为0;
何时使用
只有在满足以下两个条件时使用并行操作才能达到最佳效果:
1 系统有大量的闲置资源(CPU,I/O和内存)
2 sql串行执行时间过长,,比如超过10秒,因为并行操作的初始化操作(创建slave process和table queue)也是耗费资源的,
常用视图
v$px_process_sysstat—shows the status of query servers and provides buffer allocation statistics
V$PQ_SESSTAT—列出session级别的并行信息
V$PQ_SYSSTAT—列出system级别的并行信息
V$PQ_SLAVE—列出每一个active并行进程的信息
V$PQ_TQSTAT— provides a detailed report of message traffic at the table queue,data is valid only when queried from a session that is executing parallel SQL stateme记录当前session的并行执行的信息
V$PX_BUFFER_ADVICE-- provides statistics on historical and projected maximum buffer usage by all parallel queries. You can consult this view to reconfigure SGA size in response to insufficient memory problems for parallel queries
V$PX_PROCESS-- contains information about the parallel processes, including status, session ID, process ID, and other information
V$PX_PROCESS_SYSSTAT-- shows the status of query servers and provides buffer allocation statistics
V$PX_SESSION-- shows data about query server sessions, groups, sets, and server numbers. It also displays real-time data about the processes working on behalf of parallel execution. This table includes information about the requested degree of parallelism (DOP) and the actual DOP granted to the operation.
V$PX_SESSTAT-- provides a join of the session information from V$PX_SESSION and the V$SESSTAT table.
查看并行进程的coordinator信息,
可以使用如下两个SQL查询
col username for a12
col "QC SID" for A6
col "SID" for A6
col "QC/Slave" for A8
col "Req. DOP" for 9999
col "Actual DOP" for 9999
col "Slaveset" for A8
col "Slave INST" for A9
col "QC INST" for A6
set pages 300 lines 300
col wait_event format a30
select
decode(px.qcinst_id,NULL,username,' - '||lower(substr(pp.SERVER_NAME,
length(pp.SERVER_NAME)-4,4) ) )"Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "SlaveSet",
to_char(s.sid) "SID",
to_char(px.inst_id) "Slave INST",
decode(sw.state,'WAITING', 'WAIT', 'NOT WAIT' ) as STATE,
case sw.state WHEN 'WAITING' THEN substr(sw.event,1,30) ELSE NULL end as wait_event ,
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
to_char(px.qcinst_id) "QC INST",
px.req_degree "Req. DOP",
px.degree "Actual DOP"
from gv$px_session px,
gv$session s ,
gv$px_process pp,
gv$session_wait sw
where px.sid=s.sid (+)
and px.serial#=s.serial#(+)
and px.inst_id = s.inst_id(+)
and px.sid = pp.sid (+)
and px.serial#=pp.serial#(+)
and sw.sid = s.sid
and sw.inst_id = s.inst_id
order by
decode(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID),
px.QCSID,
decode(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP),
px.SERVER_SET,
px.INST_ID;
SELECT px.SID "SID", p.PID, p.SPID "SPID", px.INST_ID "Inst",
px.SERVER_GROUP "Group", px.SERVER_SET "Set",
px.DEGREE "Degree", px.REQ_DEGREE "Req Degree", w.event "Wait Event"
FROM GV$SESSION s, GV$PX_SESSION px, GV$PROCESS p, GV$SESSION_WAIT w
WHERE s.sid (+) = px.sid AND s.inst_id (+) = px.inst_id AND
s.sid = w.sid (+) AND s.inst_id = w.inst_id (+) AND
s.paddr = p.addr (+) AND s.inst_id = p.inst_id (+)
ORDER BY DECODE(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID), px.QCSID,
DECODE(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP), px.SERVER_SET, px.INST_ID
查看并行进程的物理读信息
SELECT QCSID, SID, INST_ID "Inst", SERVER_GROUP "Group", SERVER_SET "Set",
NAME "Stat Name", VALUE
FROM GV$PX_SESSTAT A, V$STATNAME B
WHERE A.STATISTIC# = B.STATISTIC# AND NAME LIKE 'PHYSICAL READS'
AND VALUE > 0 ORDER BY QCSID, QCINST_ID, SERVER_GROUP, SERVER_SET;
QCSID SID Inst Group Set Stat Name VALUE
------ ----- ------ ------ ------ ------------------ ----------
9 9 1 physical reads 3863
9 7 1 1 1 physical reads 2
9 21 1 1 1 physical reads 2
9 18 1 1 2 physical reads 2
9 20 1 1 2 physical reads 2
查看系统中与parallel有关的信息
SELECT NAME, VALUE FROM GV$SYSSTAT
WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%'
OR UPPER (NAME) LIKE '%PARALLELIZED%' OR UPPER (NAME) LIKE '%PX%';
NAME VALUE
-------------------------------------------------- ----------
queries parallelized 347
DML statements parallelized 0
DDL statements parallelized 0
DFO trees parallelized 463
Parallel operations not downgraded 28
Parallel operations downgraded to serial 31
Parallel operations downgraded 75 to 99 pct 252
Parallel operations downgraded 50 to 75 pct 128
Parallel operations downgraded 25 to 50 pct 43
Parallel operations downgraded 1 to 25 pct 12
PX local messages sent 74548
PX local messages recv'd 74128
PX remote messages sent 0
PX remote messages recv'd 0
查看px占用的内存信息—该instance没有分配large pool
SQL> select * from v$sgastat where name like 'PX%';
POOL NAME BYTES
------------ -------------------------- ----------
shared pool PX subheap 11651336
shared pool PX msg pool 103310848
shared pool PX QC deq stats 1480
shared pool PX QC msg stats 2288
shared pool PX subheap desc 256
shared pool PX msg pool struct 1088
shared pool PX server deq stats 1480
shared pool PX server msg stats 2288
案例
create table t as select owner, object_name name from dba_objects where owner in ('SYSMAN','ORDSYS','PUBLIC','SYS');
create table m(owner varchar2(20));
insert into m values('SYS');
--收集统计信息
运行select * from t, m where t.owner=m.owner and m.owner='SYS';
当两个表都不开启并行时
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23736 | 2202K| 55 (4)| 00:00:01 |
|* 1 | HASH JOIN | | 23736 | 2202K| 55 (4)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| M | 1 | 12 | 2 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T | 23736 | 1923K| 52 (2)| 00:00:01 |
---------------------------------------------------------------------------
只为表M开启并行alter table t parallel 4;
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23213 | 770K| 17 (6)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 23213 | 770K| 17 (6)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | 23213 | 770K| 17 (6)| 00:00:01 | Q1,01 | PCWP | |
| 4 | BUFFER SORT | | | | | | Q1,01 | PCWC | |
| 5 | PX RECEIVE | | 1 | 4 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 6 | PX SEND BROADCAST | :TQ10000 | 1 | 4 | 2 (0)| 00:00:01 | | S->P | BROADCAST |
|* 7 | TABLE ACCESS FULL| M | 1 | 4 | 2 (0)| 00:00:01 | | | |
| 8 | PX BLOCK ITERATOR | | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | PCWC | |
|* 9 | TABLE ACCESS FULL | T | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------------------------------
--访问表T用到了并行, 操作粒度为block,而M依旧是串行访问,第6步出现了S->P,并且是通过广播的方式向并行进程发送信息;
为表T开启并行alter table m parallel 4;
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23213 | 770K| 17 (6)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 23213 | 770K| 17 (6)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
|* 3 | HASH JOIN | | 23213 | 770K| 17 (6)| 00:00:01 | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | 1 | 4 | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
| 5 | PX SEND BROADCAST | :TQ10000 | 1 | 4 | 2 (0)| 00:00:01 | Q1,00 | P->P | BROADCAST |
| 6 | PX BLOCK ITERATOR | | 1 | 4 | 2 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL| M | 1 | 4 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |
| 8 | PX BLOCK ITERATOR | | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | PCWC | |
|* 9 | TABLE ACCESS FULL | T | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------------------------------
--此时M也使用并行访问,操作粒度为block;因为sql没有要求排序,最后向coordinator使用RAND方式发送数据
--M的结果集使用广播的方式发送数据,使用hint /*+ pq_distribute(t,hash,hash) */可以将其改为Hash
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23213 | 770K| 17 (6)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 23213 | 770K| 17 (6)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
|* 3 | HASH JOIN BUFFERED | | 23213 | 770K| 17 (6)| 00:00:01 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 1 | 4 | 2 (0)| 00:00:01 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10000 | 1 | 4 | 2 (0)| 00:00:01 | Q1,00 | P->P | HASH |
| 6 | PX BLOCK ITERATOR | | 1 | 4 | 2 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 7 | TABLE ACCESS FULL| M | 1 | 4 | 2 (0)| 00:00:01 | Q1,00 | PCWP | |
| 8 | PX RECEIVE | | 23213 | 680K| 14 (0)| 00:00:01 | Q1,02 | PCWP | |
| 9 | PX SEND HASH | :TQ10001 | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | P->P | HASH |
| 10 | PX BLOCK ITERATOR | | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | PCWC | |
|* 11 | TABLE ACCESS FULL| T | 23213 | 680K| 14 (0)| 00:00:01 | Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------------------------------