MySQL案例01:Last_SQL_Errno: 1755 Cannot execute the current event group in the par
周五同事监控报警,有个MySQL从库复制状态异常,让我帮忙排查下,经过排查发现是MySQL5.6并行复制的一个Bug所致,具体处理过程如下:
一、错误信息
登录mysql从库服务器,检查复制状态
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: xx.xx.xx.xx Master_User: repuser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.001689 Read_Master_Log_Pos: 145325388 Relay_Log_File: mysql2-relay-bin.004390 Relay_Log_Pos: 147850306 Relay_Master_Log_File: mysql-bin.001673 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1755 Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Skip_Counter: 0 Exec_Master_Log_Pos: 147850096 Relay_Log_Space: 4440639819 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1755 Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Replicate_Ignore_Server_Ids: Master_Server_Id: 46 Master_UUID: f6d575b8-6a5b-11e5-8313-0050568d23c9 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 180314 01:01:17 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: f6d575b8-6a5b-11e5-8313-0050568d23c9:9393159-70288025 Executed_Gtid_Set: 33a835bb-6a5a-11e5-8308-0050568d685b:1-146, f6d575b8-6a5b-11e5-8313-0050568d23c9:1-69584557 Auto_Position: 1 1 row in set (0.01 sec)
mysql错误日志内容如下
018-03-14 01:01:16 26349 [Note] Event Scheduler: Loaded 0 events 2018-03-14 01:01:16 26349 [Note] /data/mysql/bin/mysqld: ready for connections. Version: '5.6.16-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution 2018-03-14 01:01:17 26349 [ERROR] Slave SQL: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Error_code: 1755 2018-03-14 01:01:17 26349 [Warning] Slave: Cannot execute the current event group in the parallel mode. Encountered event Format_desc, relay-log name ./mysql2-relay-bin.004392, position 238 which prevents execution of this event group in parallel mode. Reason: possible malformed group of events from an old master. Error_code: 1755 2018-03-14 01:01:17 26349 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.001673' position 147850096
二、错误原因
从错误日志字面意思理解,出现这个错误是因为数据库开启了并行复制,一些event group在并行复制下,不支持从而导致复制失败,主从状态异常。
检查数据库版本:
mysql> select version(); +------------+ | version() | +------------+ | 5.6.16-log | +------------+ 1 row in set (0.07 sec)
检查数据库并行复制信息:
mysql> show variables like '%parall%'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | slave_parallel_workers | 1 | +------------------------+-------+ 1 row in set (0.05 sec)
官方搜索错误信息,发现此错误是一bug,具体bug编号为Bug #71495、Bug #72537
https://bugs.mysql.com/bug.php?id=71495
https://bugs.mysql.com/bug.php?id=72537
三、解决方案
1、既然是bug,通过分析binlog内容,然后跳过的方式跳过事物
(非gtid模式,通过set sql_slave_skip_counter=1;跳过事务)
主库分析binlog:
mysqlbinlog -vv --base64-output='decode-rows' --start-position=147850096 --stop-position=147850096 mysql-bin.001674 >a.sql
SHOW BINLOG EVENTS in 'mysql-bin.001673' from 147850096 limit 10;
从库跳过事物(gtid模式)
stop slave;
SET @@SESSION.GTID_NEXT= 'f6d575b8-6a5b-11e5-8313-0050568d23c9:9393159-70288025';
begin;
commit;
SET SESSION GTID_NEXT = AUTOMATIC;
start slave;
2、通过关闭并行复制来恢复(最佳方法)
stop slave;
set global slave_parallel_workers = 0 ;
start slave;
( 更改my.cnf中并行复制参数slave_parallel_workers = 0)
3、如果上述方法都不行,可以使用重新部署从库的方式来恢复,恢复后关闭并行复制(5.6并行复制存在bug,不推荐使用)
如果使用mysqldump备份,最好指定参数--set-gtid-purged=off,否则需要充值gtid
主库备份数据库:
innobackupexp方式:innobackupex --socket=/tmp/mysql.sock /20180316/
从库恢复数据库:
innobackupex --redo-only --apply-log /soft/2018-03-16_17-45-30
innobackupex --copy-back /soft/2018-03-16_17-45-30
查看事物应用binlog信息:cat xtrabackup_info
binlog_pos = filename 'mysql-bin.001690', position 46997344, GTID of the last change 'f6d575b8-6a5b-11e5-8313-0050568d23c9:1-70316241'
innodb_from_lsn = 0
innodb_to_lsn = 84446557453
设置从库gtid信息
set global gtid_purged='f6d575b8-6a5b-11e5-8313-0050568d23c9:1-70316241';
重新部署从库:
change master to master_host='xx.xx.xx.xx',master_user='usernamer',master_password='password',MASTER_AUTO_POSITION=84446557453;
(上述使用了reset master,否则可以直接指定从MASTER_AUTO_POSITION=1开始)