欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

GoldenGate Extract进程hang问题解决一例

程序员文章站 2022-05-04 21:20:43
...

一直运行正常的extract进程突然hang住了,起初怀疑是不是找不到归档所致,但是细细的检查了下,extract所需的归档都还在,而且日

一直运行正常的extract进程突然hang住了,起初怀疑是不是找不到归档所致,但是细细的检查了下,extract所需的归档都还在,而且日志中也没报任何错。而且将extract进程停掉后,也能正常起来,没有任何错误或者告警:

// ggserr.log
2015-05-25 11:03:27 INFO OGG-01513 Oracle GoldenGate Capture for Oracle, elis.prm: Positioning to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626.
2015-05-25 11:03:27 INFO OGG-01516 Oracle GoldenGate Capture for Oracle, elis.prm: Positioned to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626, May 20, 2015 4:50:25 PM.
2015-05-25 11:03:27 INFO OGG-01513 Oracle GoldenGate Capture for Oracle, elis.prm: Positioning to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356.
2015-05-25 11:03:27 INFO OGG-01516 Oracle GoldenGate Capture for Oracle, elis.prm: Positioned to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356, May 20, 2015 4:50:19 PM.
2015-05-25 11:03:27 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, elis.prm: EXTRACT ELIS started.

//extract进程的rpt文件
2015-05-25 11:03:27 INFO OGG-00546 Default thread stack size: 196608.

2015-05-25 11:03:27 INFO OGG-00547 Increasing thread stack size from 196608 to 1048576.

2015-05-25 11:03:27 INFO OGG-01513 Positioning to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626.

2015-05-25 11:03:27 INFO OGG-01516 Positioned to (Thread 1) Sequence 5824, RBA 290544656, SCN 1415.2504802626, May 20, 2015 4:50:25 PM.

2015-05-25 11:03:27 INFO OGG-01513 Positioning to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356.

2015-05-25 11:03:27 INFO OGG-01516 Positioned to (Thread 2) Sequence 4798, RBA 18781712, SCN 1415.2504799356, May 20, 2015 4:50:19 PM.

2015-05-25 11:03:27 INFO OGG-01517 Position of first record processed for Thread 1, Sequence 5824, RBA 290544656, SCN 1415.2504802623, May 20, 2015 4:50:25 PM.

2015-05-25 11:03:27 INFO OGG-01644 BOUNDED RECOVERY: COMPLETE: for object pool 1: p33619994_Redo Thread 1 at SeqNo: 5824, RBA: 290545168, SCN: 1415.2504802626 (6079883526466), Thread: 1.

2015-05-25 11:03:27 INFO OGG-01055 Recovery initialization completed for target file /ogg/ggate/data/dirext/lis/ea002534, at RBA 1109.

2015-05-25 11:03:27 INFO OGG-01478 Output file /ogg/ggate/data/dirext/lis/ea is using format RELEASE 11.2.

2015-05-25 11:03:27 INFO OGG-01517 Position of first record processed for Thread 2, Sequence 4798, RBA 18781712, SCN 1415.2504799356, May 20, 2015 4:50:19 PM.

2015-05-25 11:03:27 INFO OGG-01026 Rolling over remote file /ogg/ggate/data/dirext/lis/ea002534.

2015-05-25 11:03:27 INFO OGG-01053 Recovery completed for target file /ogg/ggate/data/dirext/lis/ea002535, at RBA 1109.

2015-05-25 11:03:27 INFO OGG-01057 Recovery completed for all targets.
从日志中看,extract进程没有问题,但是extract进程实际上是运行不正常的:


GGSCI (hxddlis01) 8> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
JAGENT RUNNING
EXTRACT RUNNING ELIS 96:14:58 00:00:04
EXTRACT RUNNING PL2IS 00:00:00 00:00:00
EXTRACT RUNNING PLIS 00:00:00 00:00:09
extract进程的lag越来越大,而且trail文件也不再增长。这情况真是让人一头雾水。我们先来看看当前系统的运行环境:


GoldenGate:11.2.1.0.27
Database: 11.2.0.3 两节点的RAC,使用ASM
OS: AIX 6.1
然后搜索Oracle Support,,找到一篇文章1432994.1,和碰到的情形有点相像。在我们的环境中也是使用了DBLOGREADER,但是不同的是DBLOGREADERBUFSIZE已经配置成1M:


...
TRANLOGOPTIONS BUFSIZE 1048576
TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 1048576
--TRANLOGOPTIONS _NOREADAHEAD ANY
EXTTRAIL /ogg/ggate/data/dirext/lis/ea
...
这也是AIX下能配置的最大值。难道该值还是太大的缘故?所以就尝试将该值改成了512K:


...
TRANLOGOPTIONS BUFSIZE 1048576
TRANLOGOPTIONS DBLOGREADER, DBLOGREADERBUFSIZE 524288
--TRANLOGOPTIONS _NOREADAHEAD ANY
EXTTRAIL /ogg/ggate/data/dirext/lis/ea
...
然后重启extract进程,结果就正常了。估计使用DBLOGREADER还是有不少的bug。