欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

SQL*Net网络相关的等待事件分析

程序员文章站 2022-06-09 15:39:07
...

SQL*Net more data to client SQL*Net message to client SQL*Net more data from client SQL*Net message from client 群里的朋友给了一份awr报表,一般而言我们很少能看见SQL*Net这类等待事件成为top event中,而如果出现了我们还是要对这些等待事件有个清


SQL*Net网络相关的等待事件分析

SQL*Net more data to client
SQL*Net message to client
SQL*Net more data from client
SQL*Net message from client

群里的朋友给了一份awr报表,一般而言我们很少能看见SQL*Net这类等待事件成为top event中,而如果出现了我们还是要对这些等待事件有个清晰的认识,

先来看下SQL*Net more data to client和SQL*Net message to client等待事件
SQL> select name,parameter1,parameter2,parameter3,wait_class from v$event_name w
here name in ('SQL*Net message to client','SQL*Net more data to client');

NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_C
LASS
---------------------------------------- ---------- ---------- ---------- ------
----
SQL*Net message to client driver id #bytes Network
SQL*Net more data to client driver id #bytes Network

Troubleshooting Waits for 'SQL*Net message to client' and 'SQL*Net more data to client' Events from a Performance Perspective (文档 ID 1404526.1)
A wait for 'SQL*Net message to client' occurs when a server process has sent data or messages to the client and it is waiting for a reply. The time spent waiting is time spent waiting for the response from the TCP (Transparent Network Substrate). This wait is usually considered an idle wait event, as the server process is waiting for something else to reply.

上面提到SQL*Net message to client一般发生在服务器端传输数据或者消耗到client端,等待client回复的等待,这个等待事件的消耗时间是等待tcp传输的相应。随着服务器在等待一些其他信息的回复,这个等待事件可以考虑为空闲的等待。

In terms of tuning, if individual wait times are high then the likelihood is that improvements cannot be made on the server, but elsewhere. If the total wait is high but individual waits are small then the waiting may be due to the way in which the data is being collected (i.e. too many round trips).

如果个别的等待比较高,那么调整服务器端将得不到相应的效果,可能在其他方面,如果总的等待比较高,但是个别的等待都较小,可能在由于数据被以往返的方式传输过多

For the 'SQL*Net more data to client' event wait, Oracle uses SDU (Session Data Unit) to write to the SDU buffer which is written to the TCP socket buffer. If data is larger than the the initial size of Session Data Unit then multiple chunks of data need to be sent. If there is more data to send then after each batch sent the session will wait on the 'SQL*Net more data to client' wait event.

对于SQL*Net more data to client等待事件, oracle使用sdu 来将tcp socket buffer写入到sdu buffer中。如果数据比session数据单元的初始化大小大,那么将发送多次。如果较多的数据发送,那么每批会话将看见SQL*Net more data to client等待事件。

对于诊断上述等待事件SQL*net more data to client和SQL*Net message to client最好的办法是跑一个10046 event

PARSING IN CURSOR #1 len=17 dep=0 uid=58 oct=3 lid=58 tim=26644922360 hv=1851853531 ad='5d591ab0'
select * from t01
END OF STMT
PARSE #1:c=0,e=46126,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=26644922352
BINDS #1:
EXEC #1:c=0,e=9312,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=26644933561
WAIT #1: nam='SQL*Net message to client' ela= 5 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=26644934130
FETCH #1:c=0,e=101,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=26644934600
WAIT #1: nam='SQL*Net message from client' ela= 749 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=26644935726
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=26644936242
FETCH #1:c=0,e=425,p=0,cr=1,cu=0,mis=0,r=15,dep=0,og=1,tim=26644936646
WAIT #1: nam='SQL*Net message from client' ela= 57549 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=26644994574
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1111838976 #bytes=1 p3=0 obj#=-1 tim=26644994966
。。。
STAT #1 id=1 cnt=50079 pid=0 pos=1 obj=51887 op='TABLE ACCESS FULL T01 (cr=3987 pr=0 pw=0 time=50172 us)'

看raw trace文件发现,这个sql语句一致性读为3987,其中物理读和物理写都是0,而消耗的时间则是50172us,接近50ms。

********************************************************************************

select * from t01

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3340 0.28 0.19 0 3987 0 50079
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3342 0.28 0.25 0 3987 0 50079

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3340 0.00 0.00
SQL*Net message from client 3339 0.11 118.36

Tkprof格式下raw trace: 这里总共取数据3340次,而也就伴随着3340次的SQL*Net message to client的等待,总的等待事件还为0.00,消耗的总时间是0.25秒,一致性读有3987。

Mos给出的参考标准:
Once the trace is obtained, you can TKProf it to see the timings and waits. Individual waits for 'SQL*Net message to client' are usually of very short duration (in this case the total wait is

If you notice unusually high waits for these events, for example as a top wait in statspack or AWR, then start the tuning process by tracing the process or the sql.

一般我们不必在意这个等待事件,但是如果出现在了awr或者statspack中,此时我们就需要对其跟踪进程。

1 SDU size
Remember that 'SQL*net message to client' is normally not a network issue, as the throughput is based on the TCP packet. The first session is sent the contents of the SDU buffer which is written to TCP buffer then the session waits for the 'SQL*net message to client' event. The wait is associated with the following factors:
? Oracle SDU size
? Amount of data returned to the client
One solution is to increase the SDU size. The following document can help with that:
Document 44694.1 SQL*Net Packet Sizes (SDU & TDU Parameters)

这里oracle优先推荐设置oracle SDU size,对于SDU个人不是太了解,这个属于oracle net方面的信息,这里大家可以参考上面的mos文章说明进行设置。

2 arraysize

If the application is using large amount of data, consider increasing the arraysize in the application. If small arraysize is used to fetch the data, then the query will use multiple fetch calls, each of these will wait for the 'SQL*net message to client' event. With a small arraysize and a large amount of data, the number of waits can become significant.

如果发送数据量较大,而又是一个较小的arraysize,此时一个查询就可能需要反复取数据,伴随着明显的SQL*Net message to client等待。

下面小鱼简单测试下
Set arraysize 150

然后查询设置10046 event跟踪:
select *
from
t01

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 335 0.10 0.08 0 1021 0 50079
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 337 0.10 0.08 0 1021 0 50079

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 58

Rows Row Source Operation
------- ---------------------------------------------------
50079 TABLE ACCESS FULL T01 (cr=1021 pr=0 pw=0 time=55 us)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 335 0.00 0.00
SQL*Net message from client 335 22.19 162.68
SQL*Net more data to client 895 0.00 0.00

这里将arraysize从默认的15设置为150后,此时一致性读从先前的3987降低为1021,fetch也已经从3340降低为335,而伴随着的SQL*Net message to client等待次数也有所降低了。

我们来看看T01表的数据分布,发现其中每个block大概存储着71 rows
SQL> select blocks,num_rows,round(num_rows/blocks) rows_per_block from user_tabl
es where table_name='T01';

BLOCKS NUM_ROWS ROWS_PER_BLOCK
---------- ---------- --------------
708 50079 71

当arraysize 15时,需要获取数据的次数是 50079/15= 3338.6次,而实际的获取次数3340基本相同,而当arraysize 150时,需要获取数据的次数则是50079/150=333.86次,而实际的获取次数是335次基本相同。

Mos有篇文章提到了arraysize行预取影响逻辑读和获取次数
Row Prefetching and its impact on logical reads and fetch calls (文档 ID 1419023.1)
ROW PREFETCHING
When an application fetches data from a database, it can do it row by row or by fetching numerous rows at the same time. Fetching numerous rows at a time is called row prefetching. Each time an application asks the driver to retrieve a row from the database, several rows are prefetched with it and stored in client-side memory. In this way,several subsequent requests do not have to execute database calls to fetch data. They can be served from the client-side memory. So, the number of round-trips to the database decreases proportionally to the number of prefetched rows

Oracle有一个行预取的概念存储,当应用程序请求从数据库获取数据时,获取的方式可以是逐行获取也可以是批次获取,获取数据后存储到了client的memory中,这样后续的查询就不必要再次访问数据库获取数据,而是从client的memory中获取,从而减少往返调用数据的次数。

关于我们常用的几种application关于arraysize的设置:
Sql*plus这个直接用set arraysize设置即可,默认arraysize 是15
OCI
With OCI, row prefetching is managed by two statement parameters: OCI_ATTR_PREFETCH_ROWS
and OCI_ATTR_PREFETCH_MEMORY.

JDBC
Row prefetching is enabled with the Oracle JDBC driver by default. You can change the default 10
number of fetched rows. Specify the property defaultRowPrefetch
when opening a connection to the database with either the class OracleDataSource or the class
OracleDriver

关于oci和jdbc小鱼并没有实际测试过,以后如果有时间在中间件上面也能进一步学习的话一定要进行实际的测试来验证

3 TCP
需要检查网络连接,比如交换机、局域网带宽负载等,这个需要有专门的网路工程师配合检查。

还有另外两个等待事件:SQL*Net more data from client和SQL*Net message from client

SQL*Net more data from client:
The server is reading more data from the client after it has already received some data. This typically occurs when the client is sending quite a bit of data to the server before it expects any response.

这里提到是oracle server已经接受了client发送的部分数据,这个等待经常发生在client发送大量数据给server端,但是server端还未收到任何响应。

Mos给出的减少等待次数和等待时间的参考:
1 time in the client process itself --客户端本身的时间
2 if you are expecting the client to be sending lots of data to the server –客户端传输数据到服务端的时间
3 time in the network between the client and the server. – 客户端到服务器端的网络传输时间

SQL*Net message from client
The Oracle shadow process (foreground process) is waiting for a message to arrive from the client process. This is generally considered as an "idle" event in that the Oracle shadow is idle waiting for the client process to tell it what to do. Time waiting in this state is attributable to the client process itself plus any network transport time.

SQL*Net message from client是个idle等待事件,这个发生在oracle server的后台进程等待client端发送的请求给server去处理,这个等待时间归于client自身加上网络传输
1 the client process waiting for input –客户端进程等待输入
2 time in the client process itself --客户端本身的时间
3 time in the network between the client and the server. – 客户端到服务器端的网络传输时间

我们来验证下这个SQL*Net message from client确实是个idle的等待:
SQL> select sid from v$session where sid=userenv('sid');

SID
----------
872

手动连接到数据库,查看这个连接会话的sid,这里我们没有做任何操作,换一个session来进行查询这个会话的等待事件:
SQL> select event,seconds_in_wait,state from v$session where sid=872;

EVENT SECONDS_IN_WAIT STATE
---------------------------------------- --------------- -------------------
SQL*Net message from client 285 WAITING

这里出现了所谓SQL*Net message from client等待,此时sid 872的会话没有做任何操作,只是在等待client发送请求给server process进行处理。

而上面朋友的awr报表中看出SQL*Net more data to client成为top event 2,那么根据小鱼上面文章的摘要主要为:
1 调整SDU参数
2 调整arraysize参数
3 分析网络是否存在瓶颈等