欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

MySQL并行复制的一个坑

程序员文章站 2022-06-24 20:54:13
早上巡检数据库,发现一个延迟从库的sql_thread中断了。 Last_SQL_Errno: 1755 Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gt ......

早上巡检数据库,发现一个延迟从库的sql_thread中断了。

Last_SQL_Errno: 1755
Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..

检查performance_schema下的replication_applier_status_by_worker表,除了GTID之外也没有更具体的信息:

"root@localhost:mysql3308.sock  [(none)]>select * from performance_schema.replication_applier_status_by_worker;
+--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
| CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                          | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP |
+--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+
|              |         1 |      NULL | OFF           | 0b961fcc-41c2-11e7-84fd-286ed488c7da:156369774 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         2 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         3 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         4 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         5 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         6 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         7 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
|              |         8 |      NULL | OFF           |                                                |                 0 |                    | 0000-00-00 00:00:00  |
+--------------+-----------+-----------+---------------+------------------------------------------------+-------------------+--------------------+----------------------+

既然relay_log的位置信息都有了,那就去日志里看看吧:

解析Binlog文件:

mysqlbinlog -v --base64-output=decode-rows oracle-relay-bin.000093 >1.sql

找到152912092位置点附近的日志:

MySQL并行复制的一个坑

检查了一下数据库中这个表ID为14816035的数据确实是不存在的。

另外除了这条日志,其它日志的last_committed和sequence_number都为0,last_committed表示事务提交的时候,上次事务提交的编号。last_committed和sequence_number代表的就是所谓的LOGICAL_CLOCK。

猜测如果手动把这条数据插入延迟从库,并且使用的方法重启sql_thread,相信这个错误也能被解决。

但既然带了LOGICAL_CLOCK的事务就会出错,跳过事务的方法很难保证以后不会出错。

注意到这条日志的last_committed是一个异常大的值,且错误信息中有提到The master event is logically timestamped incorrectly。我怀疑是不是并行配置的问题。

从库配置:

"root@localhost:mysql3308.sock  [(none)]>show variables like '%para%';
+------------------------+---------------+
| Variable_name          | Value         |
+------------------------+---------------+
| slave_parallel_type    | LOGICAL_CLOCK |
| slave_parallel_workers | 8             |
+------------------------+---------------+

 再检查主库配置:

(root@localhost:mysql.sock) [(none)]>show variables like '%para%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_workers | 0     |
+------------------------+-------+

 发现主库根本就没有slave_parallel_type这项配置。想起来主库是mysql5.6了。

(root@localhost:mysql.sock) [(none)]>select version();
+------------+
| version()  |
+------------+
| 5.6.35-log |
+------------+

 那么问题基本上就知道了,主库5.6只支持基于DATABASE的并行复制,而5.7的从库配置成LOGICAL_CLOCK导致了异常。

明白了问题所在,那就好解决了,把从库的slave_parallel_type改为DATABASE,再起sql_thread问题应该就解决了:

"root@localhost:mysql3308.sock  [none]>set global slave_parallel_type='DATABASE';
Query OK, 0 rows affected (0.00 sec)

"root@localhost:mysql3308.sock  [none]>show global variables like '%slave_parallel_type%';
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| slave_parallel_type | DATABASE |
+---------------------+----------+
1 row in set (0.00 sec)

"root@localhost:mysql3308.sock  [none]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000104
          Read_Master_Log_Pos: 160115307
               Relay_Log_File: oracle-relay-bin.000093
                Relay_Log_Pos: 152912092
        Relay_Master_Log_File: binlog.000100
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1755
                   Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 152911925
              Relay_Log_Space: 4455094667
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1755
               Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./oracle-relay-bin.000093, position 152912092 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 50
                  Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 3600
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 180716 18:02:56
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843604
            Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156369774
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

"root@localhost:mysql3308.sock  [none]>stop slave sql_thread;
Query OK, 0 rows affected, 1 warning (0.00 sec)

"root@localhost:mysql3308.sock  [none]>start slave sql_thread;
Query OK, 0 rows affected (0.01 sec)

"root@localhost:mysql3308.sock  [none]>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: rep
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: binlog.000104
          Read_Master_Log_Pos: 160161836
               Relay_Log_File: oracle-relay-bin.000093
                Relay_Log_Pos: 169205552
        Relay_Master_Log_File: binlog.000100
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 169205385
              Relay_Log_Space: 4455141196
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 5351
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 50
                  Master_UUID: 0b961fcc-41c2-11e7-84fd-286ed488c7da
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 3600
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Waiting for Slave Worker to release partition
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:111060115-163843692
            Executed_Gtid_Set: 0b961fcc-41c2-11e7-84fd-286ed488c7da:1-156400100
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

打完收工。

转载请注明出处。

本文地址: