Thread cannot allocate new log, sequence 浅析

程序员文章站 2022-07-05 11:04:45

有时候，你会在ORACLE数据库的告警日志中发现“Thread cannot allocate new log, sequence Checkpoint not complete”这类告警。具体案例如下所示： Thread 1 cannot allocate new log, sequence 27... ......

有时候，你会在oracle数据库的告警日志中发现“thread <number> cannot allocate new log, sequence <number> checkpoint not complete”这类告警。具体案例如下所示：

thread 1 cannot allocate new log, sequence 279334

checkpoint not complete

current log# 4 seq# 279333 mem# 0: /u01/oradata/gsp/redo04.log

current log# 4 seq# 279333 mem# 1: /u03/oradata/gsp/redo04.log

当然thread或sequence的数值可能有所不同，基本上是类似下面这样的告警信息

thread <number> cannot allocate new log, sequence <number>

checkpoint not complete

也有可能是因为在等待重做日志的归档，出现的是下面这类告警信息

oracle instance <name> - can not allocate log, archival required

thread <number> cannot allocate new log, sequence <number>

那么出现这类告警的具体原因是什么呢？以及要如何去解决这个问题呢？

原因分析：

通常来说是因为重做日志（redo log）在写满后就会切换日志组，这个时候就会触发一次检查点事件（checkpoint），检查点（checkpoint）激活时会触发数据库写进程(dbwr)，将数据缓冲区里的脏数据块写回到磁盘的数据文件中，只要这个脏数据写回磁盘事件没结束，那么数据库就不会释放这个日志组。在归档模式下，还会伴随着arch进程将重做日志进行归档的过程。如果重做日志（redo log）产生的过快，当cpk或归档还没完成，lgwr已经把其余的日志组写满，又要往当前的日志组里面写redo log的时候，这个时候就会发生冲突，数据库就会被挂起。并且一直会往alert.log中写类似上面的错误信息。

另外，重做日志在不同业务时段的切换频率不一样，所以出现这个错误，一般是业务繁忙或者出现大量dml操作的时候。

解决方法：

1：增大redo log file的大小

增大redo log file的大小容易操作，但是redo log file设置为多大才是合理的呢？

1：参考v$instance_recovery中optimal_logfile_size字段值，但是这个字段有可能为null值，除非你调整fast_start_mttr_target参数的值大于0

redo log file size (in megabytes) that is considered optimal based on the current setting of fast_start_mttr_target. it is recommended that the user configure all online redo logs to be at least this value.

官方文档的建议如下：

you can use the v$instance_recovery view column optimal_logfile_size to determine the size of your online redo logs. this field shows the redo log file size in megabytes that is considered optimal based on the current setting of fast_start_mttr_target. if this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.

note, however, that the redo log file size affects the mttr. in some cases, you may be able to refine your choice of the optimal fast_start_mttr_target value by re-running the mttr advisor with your suggested optimal log file size.

sql> select optimal_logfile_size from v$instance_recovery;

2：根据重做日志切换次数和重做日志生成的量来判断

可以用awr_redo_size_history脚本统计分析一下，每个小时、每天生成的归档日志的大小，然后可以某些时间段（切换频繁的时间段）的归档日志大小和15~ 20分钟（如果某个时间段切换非常频繁，几乎无法使用这个规则，因为重组日志会非常大）切换一次计算重做日志大小。当然这个不是放之四海而皆准的规则，需要根据实际业务判断，大部分情况下还是可以参考这个

计算重做日志的一个脚本，仅供参考

select
(select round(avg(bytes) / 1024 / 1024, 2) from v$log) as "redo size (mb)",
round((20 / average_period) * (select avg(bytes)
from v$log) / 1024 / 1024, 2) as "recommended size (mb)"
from (select avg((next_time - first_time) * 24 * 60) as average_period
from v$archived_log
where first_time > sysdate - 3
    and to_char(first_time, 'hh24:mi') between
    &start_of_peak_hours and &end_of_peak_hours
);

2：增加redo log group的数量

增加日志组的数量，其实并不能解决“thread <number> cannot allocate new log, sequence <number> checkpoint not complete” 这个问题，但是他能解决下面这个问题：

oracle instance <name> - can not allocate log, archival required

thread <number> cannot allocate new log, sequence <number>

这个是因为arch进程，尚未完成将重做日志文件复制到归档目标（需要存档），而此时由于重做日志切换太快或日志组过少，必须等待arcr进程完成归档后，才能循环覆盖日志组。

3：tune checkpoint

这个比较难，参考官方文档：note 147468.1 checkpoint tuning and troubleshooting guide

4：increase i/o speed for writing online redo log/archived redo

this applies to thread <number> cannot allocate new log, sequence <number>

checkpoint not complete

- use async i/o if not already so

- use dbwr i/o slaves or multiple dbwr processes

reference:

oracle database performance tuning guide

instance tuning using performance views

consider multiple database writer (dbwr) processes or i/o slaves

10.2 - http://docs.oracle.com/cd/b19306_01/server.102/b14211/instance_tune.htm#i42802

11.1 - http://docs.oracle.com/cd/b28359_01/server.111/b28274/instance_tune.htm#i42802

11.2 - http://docs.oracle.com/cd/e11882_01/server.112/e16638/instance_tune.htm#pfgrf94511

- consider the generic recommendations for redo log files:

if the high i/o files are redo log files, then consider splitting the redo log files from the other files. possible configurations can include the following:

1. placing all redo logs on one disk without any other files. also consider availability; members of the same group should be on different physical disks and controllers for recoverability purposes.

2. placing each redo log group on a separate disk that does not store any other files.

3. striping the redo log files across several disks, using an operating system striping tool. (manual striping is not possible in this situation.)

4. avoiding the use of raid 5 for redo logs.

reference:

oracle database performance tuning guide

redo log files

10.2 - http://docs.oracle.com/cd/b19306_01/server.102/b14211/iodesign.htm#sthref534

11.1 - http://docs.oracle.com/cd/b28359_01/server.111/b28274/iodesign.htm#chdbcdhg

11.2 - http://docs.oracle.com/cd/e11882_01/server.112/e16638/iodesign.htm#pfgrf94396

for

oracle instance <name> - can not allocate log, archival required

thread <number> cannot allocate new log, sequence <number>

in the above document you may check section "archived redo logs"

5：找到产生大量重做日志的sql，如果这个sql有业务或逻辑上不合理的地方，就要修改，或者将相关表设置为nologging，减少重做日志的产生

关于如何定位那些sql产生了大量的重做日志，可以使用logminer工具,也可以参考我这篇博客“如何定位那些sql产生了大量的redo日志”

参考资料：

https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::p11_question_id:69012348056

manual log switching causing "thread 1 cannot allocate new log" message in the alert log (文档 id 435887.1)

can not allocate log (文档 id 1265962.1)

https://gokhanatil.com/2009/08/optimum-size-of-the-online-redo-log-files.html

上一篇： linux系统下hosts文件详解及配置

下一篇：数据库的数据类型