[20190409]pre_page_sga=true与连接缓慢的问题.txt
[20190409]pre_page_sga=true与连接缓慢的问题.txt
--//曾经遇到11g下设置pre_page_sga=true启动缓慢的问题(没有使用hugepages).
--//链接:http://blog.itpub.net/267265/viewspace-2295412/
--//实际上这样系统也会遇到连接缓慢的情况,通过测试说明问题.
1.建立pfile:
$ cat initxxxx.ora
db_name=xxxx
instance_name=xxxx
sga_target=20g
#sga_target=0
sga_max_size=20g
pre_page_sga=true
$ export oracle_sid=xxxx
sys@xxxx> startup nomount
oracle instance started.
total system global area 2.1379e+10 bytes
fixed size 2262656 bytes
variable size 2818574720 bytes
database buffers 1.8522e+10 bytes
redo buffers 36073472 bytes
$ grep -i page /proc/meminfo
anonpages: 239336 kb
pagetables: 701008 kb
anonhugepages: 0 kb
hugepages_total: 104
hugepages_free: 48
hugepages_rsvd: 41
hugepages_surp: 0
hugepagesize: 2048 kb
--//才启动没有任何用户连接,pagetables=701008 kb.
2.测试登陆问题:
$ time sqlplus -s -l / as sysdba <<<quit
real 0m4.944s
user 0m0.024s
sys 0m0.016s
--//单个连接需要5秒.如果多个用户并发呢?
$ cat aa.sql
quit
$ time seq 50 | xargs -i{} -p 50 sqlplus -s -l / as sysdba @ aa.sql
real 0m13.978s
user 0m1.833s
sys 0m1.017s
--//多个用户同时登录,全部完成需要13秒.
3.分析:
$ strace -f -c sqlplus -s -l / as sysdba <<<quit
process 3120 attached
process 3120 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.85 0.419648 69941 6 shmdt
0.10 0.000434 1 413 2 read
0.01 0.000046 0 287 127 open
0.01 0.000045 0 180 mmap
0.01 0.000037 0 178 close
0.01 0.000032 0 79 rt_sigaction
....
------ ----------- ----------- --------- --------- ----------------
100.00 0.420259 1970 227 total
$ man shmdt
...
shmdt() detaches the shared memory segment located at the address specified by shmaddr from the address space of the
calling process. the to-be-detached segment must be currently attached with shmaddr equal to the value returned by the
attaching shmat() call.
on a successful shmdt() call the system updates the members of the shmid_ds structure associated with the shared memory
segment as follows:
shm_dtime is set to the current time.
shm_lpid is set to the process-id of the calling process.
shm_nattch is decremented by one. if it becomes 0 and the segment is marked for deletion, the segment is deleted.
--//给人的感觉好像消耗在调用shmdt,实际上消耗的时间0.419648.也就是这样还是无法定位问题在那里.
$ strace -fttt -e shmdt sqlplus -s -l / as sysdba
process 6046 attached
[pid 6046] 08:41:55.548586 shmdt(0x7f2c41600000) = 0 <0.000036>
quit
08:41:59.992835 shmdt(0x60000000) = 0 <0.000042>
08:41:59.992983 shmdt(0x68000000) = 0 <0.000042>
08:41:59.993097 shmdt(0x6c000000) = 0 <0.360175>
08:42:00.353446 shmdt(0x4ec000000) = 0 <0.037200>
08:42:00.390737 shmdt(0x560000000) = 0 <0.000026>
process 6046 detached
--//我手工执行quit,可以发现shmdt调用发生在退出阶段,也就是不是它导致登陆缓慢.
$ ipcs -smt
------ shared memory attach/detach/change times --------
shmid owner attached detached changed
352092161 oracle apr 10 17:26:09 apr 10 17:26:17 apr 10 17:09:55
352124930 oracle apr 10 17:26:09 apr 10 17:26:17 apr 10 17:09:55
352157699 oracle apr 10 17:26:09 apr 10 17:26:18 apr 10 17:09:55
352190469 oracle apr 10 17:26:09 apr 10 17:26:18 apr 10 17:09:55
352223238 oracle apr 10 17:26:09 apr 10 17:26:18 apr 10 17:09:55
------ semaphore operation/change times --------
semid owner last-op last-changed
310738944 oracle wed apr 10 17:11:33 2019 wed apr 10 17:11:33 2019
--//从执行时间看4.5秒并不是消耗在这里.
4.继续分析,排除调用shmdt的情况:
$ strace -ftr -o /tmp/a2.txt sqlplus -s -l / as sysdba <<<quit
$ awk '{print $2}' /tmp/a2.txt | sort -nr | head
5.041521
0.287119
0.064745
0.008298
0.006386
0.004288
0.004272
0.001319
0.000982
0.000978
--//查看a2.txt
3366 0.000067 stat("/u01/app/oracle/product/11.2.0.4/dbhome_1/lib", {st_mode=s_ifdir|0755, st_size=12288, ...}) = 0 <0.000025>
3366 0.000112 chdir("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs") = 0 <0.000027>
3366 5.041521 mmap(null, 143360, prot_read|prot_write, map_private|map_anonymous, -1, 0) = 0x7f4c2c0b3000 <0.000048>
3366 0.000407 mmap(null, 143360, prot_read|prot_write, map_private|map_anonymous, -1, 0) = 0x7f4c2c090000 <0.000015>
3366 0.000127 mmap(null, 143360, prot_read|prot_write, map_private|map_anonymous, -1, 0) = 0x7f4c2c06d000 <0.000013>
3366 0.000218 getcwd("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs"..., 256) = 46 <0.000016>
3366 0.000087 access("/etc/intchg.ora", f_ok) = -1 enoent (no such file or directory) <0.000019>
...
--//也就是在chdir调用与mmap调用之间有5秒不知道在干什么?(chdir 仅仅需要0.000027秒).strace看不出来!!
--//如果你使用top -u oracle观察,你可以发现res字段不断在增加,接近sga设置.我的感觉就是在这里建立pagetables的过程.
$ cat aa.sql
host sleep 1
quit
$ cat a.sh
#! /bin/bash
export oracle_sid=xxxx
sqlplus -s -l / as sysdba @ aa.sql &
a=$(($!+2))
top -p $a -d 1 -b
$ . a.sh | tee /tmp/aa.txt
...
--//按ctrl+c中断.
$ grep pid /tmp/aa.txt | head -1; egrep 'oracle' /tmp/aa.txt
pid user pr ni virt res shr s %cpu %mem time+ command
8498 oracle 20 0 20.2g 1.5g 1.5g r 93.9 1.2 0:00.47 oracle
8498 oracle 20 0 20.2g 5.8g 5.8g r 99.7 4.6 0:01.47 oracle
8498 oracle 20 0 20.2g 10g 10g r 99.8 8.4 0:02.47 oracle
8498 oracle 20 0 20.2g 15g 15g r 99.8 12.2 0:03.47 oracle
--//sleep 1
8498 oracle 20 0 20.2g 19g 19g s 95.9 15.7 0:04.43 oracle
8498 oracle 20 0 20.2g 19g 19g s 0.0 15.7 0:04.43 oracle
8498 oracle 20 0 20.0g 19g 19g r 1.0 15.6 0:04.44 oracle
--//你可以发现res,shr都在增加,cpu使用率在9x%,感觉11g版本在设置pre_page_sga=true下有问题.
--//大致定位时间定位在建立pagetables的过程.
5.改用hugepages呢?
$ grep -i page /proc/meminfo
anonpages: 244144 kb
pagetables: 11508 kb
anonhugepages: 0 kb
hugepages_total: 26000
hugepages_free: 15800
hugepages_rsvd: 41
hugepages_surp: 0
hugepagesize: 2048 kb
$ time sqlplus -s -l / as sysdba <<<quit
real 0m0.180s
user 0m0.029s
sys 0m0.011s
$ strace -ftr -o /tmp/a3.txt sqlplus -s -l / as sysdba <<<quit
$ awk '{print $2}' /tmp/a3.txt | sort -nr | head
0.121929
0.009182
0.009144
0.009047
0.005081
0.002050
0.001984
0.001932
0.001321
0.001243
--//查看a3.txt
8716 0.000095 chdir("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs") = 0 <0.000019>
--//这里等待时间很小.
8716 0.121929 mmap(null, 143360, prot_read|prot_write, map_private|map_anonymous, -1, 0) = 0x7f552659d000 <0.000029>
8716 0.000143 mmap(null, 143360, prot_read|prot_write, map_private|map_anonymous, -1, 0) = 0x7f552657a000 <0.000020>
8716 0.000166 mmap(null, 143360, prot_read|prot_write, map_private|map_anonymous, -1, 0) = 0x7f5526557000 <0.000020>
8716 0.000260 getcwd("/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs"..., 256) = 46 <0.000023>
--//总之:
如果pre_page_sga=true最好开启hugepages.
很奇怪的是12c 缺省pre_page_sga=true,没有开启hugepages的情况下,没有这个问题.
上一篇: 测试oracle表空间自动扩展
下一篇: 数据库的4种常用设计模式