GTID环境中手动修复主从故障一例(Error 1236/Error 1396)
程序员文章站
2022-06-05 08:12:38
Preface I got an replication error 1236 when I modified the password of a user without starting slave threads of replication.Further more,the user was ......
Preface
I got an replication error 1236 when I modified the password of a user without starting slave threads of replication.Further more,the user was absent in slave.below is the whole operation how I solve the issue.
Procedure
I modified the user password on master at first.
1 root@localhost:mysql3306.sock [(none)]>alter user zlm@'192.168.56.%' identified by 'zlmzlm'; 2 Query OK, 0 rows affected (0.00 sec)
I then started the slave replication on zlm3.
1 #mysql -uroot -p -S /var/lib/mysql/mysql.sock 2 Enter password: 3 Welcome to the MySQL monitor. Commands end with ; or \g. 4 Your MySQL connection id is 3 5 Server version: 5.7.21-log MySQL Community Server (GPL) 6 7 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. 8 9 Oracle is a registered trademark of Oracle Corporation and/or its 10 affiliates. Other names may be trademarks of their respective 11 owners. 12 13 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. 14 15 root@localhost:mysql.sock [(none)]>start slave; 16 Query OK, 0 rows affected (0.00 sec) 17 18 root@localhost:mysql.sock [(none)]>show slave status\G 19 *************************** 1. row *************************** 20 Slave_IO_State: 21 Master_Host: 192.168.56.100 22 Master_User: repl 23 Master_Port: 3306 24 Connect_Retry: 60 25 Master_Log_File: mysql-bin.000047 26 Read_Master_Log_Pos: 194 27 Relay_Log_File: relay-bin.000027 28 Relay_Log_Pos: 4 29 Relay_Master_Log_File: mysql-bin.000047 30 Slave_IO_Running: No //The IO_Thread is not running on account of Error 1236 31 Slave_SQL_Running: Yes 32 Replicate_Do_DB: 33 Replicate_Ignore_DB: 34 Replicate_Do_Table: 35 Replicate_Ignore_Table: 36 Replicate_Wild_Do_Table: 37 Replicate_Wild_Ignore_Table: 38 Last_Errno: 0 39 Last_Error: 40 Skip_Counter: 0 41 Exec_Master_Log_Pos: 194 42 Relay_Log_Space: 532 43 Until_Condition: None 44 Until_Log_File: 45 Until_Log_Pos: 0 46 Master_SSL_Allowed: No 47 Master_SSL_CA_File: 48 Master_SSL_CA_Path: 49 Master_SSL_Cert: 50 Master_SSL_Cipher: 51 Master_SSL_Key: 52 Seconds_Behind_Master: NULL 53 Master_SSL_Verify_Server_Cert: No 54 Last_IO_Errno: 1236 55 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' 56 Last_SQL_Errno: 0 57 Last_SQL_Error: 58 Replicate_Ignore_Server_Ids: 59 Master_Server_Id: 1003306 60 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 61 Master_Info_File: /data/mysql/mysql3306/data/master.info 62 SQL_Delay: 0 63 SQL_Remaining_Delay: NULL 64 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 65 Master_Retry_Count: 86400 66 Master_Bind: 67 Last_IO_Error_Timestamp: 180721 09:05:50 68 Last_SQL_Error_Timestamp: 69 Master_SSL_Crl: 70 Master_SSL_Crlpath: 71 Retrieved_Gtid_Set: 72 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1-7, 73 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-2700058 //This is the id of maximum transaction executed on slave. 74 Auto_Position: 1 75 Replicate_Rewrite_DB: 76 Channel_Name: 77 Master_TLS_Version: 78 1 row in set (0.00 sec)
I decided to set gtid_purged variable to a certain value.So I executed "reset master;" to remove the value of "Executed_Gtid_Set".
1 root@localhost:mysql.sock [(none)]>reset master; 2 Query OK, 0 rows affected (0.05 sec) 3 4 root@localhost:mysql.sock [(none)]>show slave status\G 5 *************************** 1. row *************************** 6 Slave_IO_State: 7 Master_Host: 192.168.56.100 8 Master_User: repl 9 Master_Port: 3306 10 Connect_Retry: 60 11 Master_Log_File: mysql-bin.000047 12 Read_Master_Log_Pos: 194 13 Relay_Log_File: relay-bin.000027 14 Relay_Log_Pos: 4 15 Relay_Master_Log_File: mysql-bin.000047 16 Slave_IO_Running: No 17 Slave_SQL_Running: Yes 18 Replicate_Do_DB: 19 Replicate_Ignore_DB: 20 Replicate_Do_Table: 21 Replicate_Ignore_Table: 22 Replicate_Wild_Do_Table: 23 Replicate_Wild_Ignore_Table: 24 Last_Errno: 0 25 Last_Error: 26 Skip_Counter: 0 27 Exec_Master_Log_Pos: 194 28 Relay_Log_Space: 532 29 Until_Condition: None 30 Until_Log_File: 31 Until_Log_Pos: 0 32 Master_SSL_Allowed: No 33 Master_SSL_CA_File: 34 Master_SSL_CA_Path: 35 Master_SSL_Cert: 36 Master_SSL_Cipher: 37 Master_SSL_Key: 38 Seconds_Behind_Master: NULL 39 Master_SSL_Verify_Server_Cert: No 40 Last_IO_Errno: 1236 41 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' 42 Last_SQL_Errno: 0 43 Last_SQL_Error: 44 Replicate_Ignore_Server_Ids: 45 Master_Server_Id: 1003306 46 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 47 Master_Info_File: /data/mysql/mysql3306/data/master.info 48 SQL_Delay: 0 49 SQL_Remaining_Delay: NULL 50 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 51 Master_Retry_Count: 86400 52 Master_Bind: 53 Last_IO_Error_Timestamp: 180721 09:11:00 54 Last_SQL_Error_Timestamp: 55 Master_SSL_Crl: 56 Master_SSL_Crlpath: 57 Retrieved_Gtid_Set: 58 Executed_Gtid_Set: //The value has been cleared after operation of "reset master;". 59 Auto_Position: 1 60 Replicate_Rewrite_DB: 61 Channel_Name: 62 Master_TLS_Version: 63 1 row in set (0.00 sec)
In order to know which value of "gtid_purged" I should set,I analyzed the binlog on master.
1 zlm@192.168.56.100:3306 [(none)]>show master status; 2 +------------------+----------+--------------+------------------+-------------------------------------------------+ 3 | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | 4 +------------------+----------+--------------+------------------+-------------------------------------------------+ 5 | mysql-bin.000090 | 440 | | | 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 | 6 +------------------+----------+--------------+------------------+-------------------------------------------------+ 7 1 row in set (0.00 sec) 8 9 zlm@192.168.56.100:3306 [(none)]>flush logs; 10 Query OK, 0 rows affected (0.04 sec) 11 12 zlm@192.168.56.100:3306 [(none)]>show master status; 13 +------------------+----------+--------------+------------------+-------------------------------------------------+ 14 | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | 15 +------------------+----------+--------------+------------------+-------------------------------------------------+ 16 | mysql-bin.000091 | 194 | | | 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 | 17 +------------------+----------+--------------+------------------+-------------------------------------------------+ 18 1 row in set (0.00 sec) 19 20 [root@zlm1 09:16:13 /data/mysql/mysql3306/logs] 21 #mysqlbinlog -vv --base64-output=decode-rows mysql-bin.000090 > /root/1.log 22 [root@zlm1 09:16:36 /data/mysql/mysql3306/logs] 23 #cd 24 25 [root@zlm1 09:16:37 ~] 26 #ls -l|grep 1.log 27 -rw-r--r-- 1 root root 1743 Jul 21 09:16 1.log 28 [root@zlm1 09:17:23 ~] 29 #less 1.log 30 ... 31 32 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; 33 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; 34 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; 35 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; 36 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; 37 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; 38 DELIMITER /*!*/; 39 # at 4 40 #180721 9:01:29 server id 1003306 end_log_pos 123 CRC32 0x6e71f8f2 Start: binlog v 4, server v 5.7.21-log created 180721 9:01:29 at startup 41 ROLLBACK/*!*/; 42 # at 123 43 #180721 9:01:33 server id 1003306 end_log_pos 194 CRC32 0x074cfb08 Previous-GTIDs 44 # 2a4b3562-2ab6-11e8-be7a-080027de0e0e:5-12715626 45 # at 194 46 #180721 9:02:53 server id 1003306 end_log_pos 259 CRC32 0x9125f364 GTID last_committed=0 sequence_number=1 rbr_only=no 47 SET @@SESSION.GTID_NEXT= '2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627'/*!*/; 48 # at 259 49 #180721 9:02:53 server id 1003306 end_log_pos 440 CRC32 0x7a5353bf Query thread_id=3 exec_time=0 error_code=0 50 SET TIMESTAMP=1532156573/*!*/; 51 SET @@session.pseudo_thread_id=3/*!*/; 52 SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; 53 SET @@session.sql_mode=1436549152/*!*/; 54 SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; 55 /*!\C utf8 *//*!*/; 56 SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; 57 SET @@session.lc_time_names=0/*!*/; 58 SET @@session.collation_database=DEFAULT/*!*/; 59 ALTER USER 'zlm'@'192.168.56.%' IDENTIFIED WITH 'mysql_native_password' AS '*512FB3FAA6F522E351929CAC70AD8EBB0F5901B6' 60 /*!*/; 61 # at 440 62 #180721 9:11:46 server id 1003306 end_log_pos 487 CRC32 0x0882ea24 Rotate to mysql-bin.000091 pos: 4 63 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; 64 DELIMITER ; 65 # End of log file 66 67 //I've known that the operation of altering password is in the uuid:12715627.Therefore,I could set gtid_purged to the previous gtid(uuid:5-12715626).
I tried to set "gtid_purged" to the right value according to the content of binlog on master.
1 root@localhost:mysql.sock [(none)]>set @@global.gtid_purged='2a4b3562-2ab6-11e8-be7a-080027de0e0e:5-12715626'; 2 Query OK, 0 rows affected (0.00 sec) 3 4 root@localhost:mysql.sock [(none)]>start slave io_thread; 5 Query OK, 0 rows affected (0.01 sec) 6 7 root@localhost:mysql.sock [(none)]>show slave status\G 8 *************************** 1. row *************************** 9 Slave_IO_State: 10 Master_Host: 192.168.56.100 11 Master_User: repl 12 Master_Port: 3306 13 Connect_Retry: 60 14 Master_Log_File: mysql-bin.000047 15 Read_Master_Log_Pos: 194 16 Relay_Log_File: relay-bin.000027 17 Relay_Log_Pos: 4 18 Relay_Master_Log_File: mysql-bin.000047 19 Slave_IO_Running: No //It was not running again. 20 Slave_SQL_Running: Yes 21 Replicate_Do_DB: 22 Replicate_Ignore_DB: 23 Replicate_Do_Table: 24 Replicate_Ignore_Table: 25 Replicate_Wild_Do_Table: 26 Replicate_Wild_Ignore_Table: 27 Last_Errno: 0 28 Last_Error: 29 Skip_Counter: 0 30 Exec_Master_Log_Pos: 194 31 Relay_Log_Space: 532 32 Until_Condition: None 33 Until_Log_File: 34 Until_Log_Pos: 0 35 Master_SSL_Allowed: No 36 Master_SSL_CA_File: 37 Master_SSL_CA_Path: 38 Master_SSL_Cert: 39 Master_SSL_Cipher: 40 Master_SSL_Key: 41 Seconds_Behind_Master: NULL 42 Master_SSL_Verify_Server_Cert: No 43 Last_IO_Errno: 1236 44 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' 45 Last_SQL_Errno: 0 46 Last_SQL_Error: 47 Replicate_Ignore_Server_Ids: 48 Master_Server_Id: 1003306 49 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 50 Master_Info_File: /data/mysql/mysql3306/data/master.info 51 SQL_Delay: 0 52 SQL_Remaining_Delay: NULL 53 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 54 Master_Retry_Count: 86400 55 Master_Bind: 56 Last_IO_Error_Timestamp: 180721 09:20:25 57 Last_SQL_Error_Timestamp: 58 Master_SSL_Crl: 59 Master_SSL_Crlpath: 60 Retrieved_Gtid_Set: 61 Executed_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:5-12715626 62 Auto_Position: 1 63 Replicate_Rewrite_DB: 64 Channel_Name: 65 Master_TLS_Version: 66 1 row in set (0.00 sec) 67 68 root@localhost:mysql.sock [(none)]>reset master; 69 Query OK, 0 rows affected (0.03 sec) 70 71 root@localhost:mysql.sock [(none)]>set @@global.gtid_purged='2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715626'; //I began with the first gtid this time. 72 Query OK, 0 rows affected (0.00 sec) 73 74 root@localhost:mysql.sock [(none)]>stop slave; 75 Query OK, 0 rows affected (0.00 sec) 76 77 root@localhost:mysql.sock [(none)]>start slave; 78 Query OK, 0 rows affected (0.00 sec) 79 80 root@localhost:mysql.sock [(none)]>show slave status\G 81 *************************** 1. row *************************** 82 Slave_IO_State: Waiting for master to send event 83 Master_Host: 192.168.56.100 84 Master_User: repl 85 Master_Port: 3306 86 Connect_Retry: 60 87 Master_Log_File: mysql-bin.000091 88 Read_Master_Log_Pos: 194 89 Relay_Log_File: relay-bin.000028 90 Relay_Log_Pos: 367 91 Relay_Master_Log_File: mysql-bin.000090 92 Slave_IO_Running: Yes //IO_Thread could work normally. 93 Slave_SQL_Running: No //SQL_Thread was down instead. 94 Replicate_Do_DB: 95 Replicate_Ignore_DB: 96 Replicate_Do_Table: 97 Replicate_Ignore_Table: 98 Replicate_Wild_Do_Table: 99 Replicate_Wild_Ignore_Table: 100 Last_Errno: 1396 //A new arrival error 1396 occurred. 101 Last_Error: Error 'Operation ALTER USER failed for 'zlm'@'192.168.56.%'' on query. Default database: ''. Query: 'ALTER USER 'zlm'@'192.168.56.%' IDENTIFIED WITH 'mysql_native_password' AS '*512FB3FAA6F522E351929CAC70AD8EBB0F5901B6'' 102 Skip_Counter: 0 103 Exec_Master_Log_Pos: 194 104 Relay_Log_Space: 1556 105 Until_Condition: None 106 Until_Log_File: 107 Until_Log_Pos: 0 108 Master_SSL_Allowed: No 109 Master_SSL_CA_File: 110 Master_SSL_CA_Path: 111 Master_SSL_Cert: 112 Master_SSL_Cipher: 113 Master_SSL_Key: 114 Seconds_Behind_Master: NULL 115 Master_SSL_Verify_Server_Cert: No 116 Last_IO_Errno: 0 117 Last_IO_Error: 118 Last_SQL_Errno: 1396 119 Last_SQL_Error: Error 'Operation ALTER USER failed for 'zlm'@'192.168.56.%'' on query. Default database: ''. Query: 'ALTER USER 'zlm'@'192.168.56.%' IDENTIFIED WITH 'mysql_native_password' AS '*512FB3FAA6F522E351929CAC70AD8EBB0F5901B6'' 120 Replicate_Ignore_Server_Ids: 121 Master_Server_Id: 1003306 122 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 123 Master_Info_File: /data/mysql/mysql3306/data/master.info 124 SQL_Delay: 0 125 SQL_Remaining_Delay: NULL 126 Slave_SQL_Running_State: 127 Master_Retry_Count: 86400 128 Master_Bind: 129 Last_IO_Error_Timestamp: 130 Last_SQL_Error_Timestamp: 180721 09:22:48 131 Master_SSL_Crl: 132 Master_SSL_Crlpath: 133 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627 134 Executed_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715626 135 Auto_Position: 1 136 Replicate_Rewrite_DB: 137 Channel_Name: 138 Master_TLS_Version: 139 1 row in set (0.00 sec) 140 141 root@localhost:mysql.sock [(none)]>ALTER USER 'zlm'@'192.168.56.%' IDENTIFIED WITH 'mysql_native_password' AS '*512FB3FAA6F522E351929CAC70AD8EBB0F5901B6'; //I couldn't executed it in mysql cli either. 142 ERROR 1396 (HY000): Operation ALTER USER failed for 'zlm'@'192.168.56.%' 143 root@localhost:mysql.sock [(none)]>
Then,I checked the user on both master and slave.I found that there was no "zlm" user on slave at all.
1 //Master 2 root@localhost:mysql3306.sock [(none)]>select user,host from mysql.user; 3 +---------------+--------------+ 4 | user | host | 5 +---------------+--------------+ 6 | repl | 192.168.56.% | 7 | zlm | 192.168.56.% | 8 | bkuser | localhost | 9 | mysql.session | localhost | 10 | mysql.sys | localhost | 11 | root | localhost | 12 +---------------+--------------+ 13 6 rows in set (0.00 sec) 14 15 //Slave 16 root@localhost:mysql.sock [(none)]>select user,host from mysql.user; 17 +---------------+--------------+ 18 | user | host | 19 +---------------+--------------+ 20 | repl | 192.168.56.% | 21 | bkuser | localhost | 22 | mysql.session | localhost | 23 | mysql.sys | localhost | 24 | root | localhost | 25 +---------------+--------------+ 26 5 rows in set (0.00 sec) 27 28 root@localhost:mysql.sock [(none)]>create user zlm@'192.168.56.%'; //I created the absent user "zlm" on slave. 29 Query OK, 0 rows affected (0.00 sec) 30 31 root@localhost:mysql.sock [(none)]>start slave sql_thread; 32 Query OK, 0 rows affected (0.01 sec) 33 34 root@localhost:mysql.sock [(none)]>show slave status\G 35 *************************** 1. row *************************** 36 Slave_IO_State: Waiting for master to send event 37 Master_Host: 192.168.56.100 38 Master_User: repl 39 Master_Port: 3306 40 Connect_Retry: 60 41 Master_Log_File: mysql-bin.000091 42 Read_Master_Log_Pos: 194 43 Relay_Log_File: relay-bin.000030 44 Relay_Log_Pos: 407 45 Relay_Master_Log_File: mysql-bin.000091 46 Slave_IO_Running: Yes 47 Slave_SQL_Running: Yes //It was ok. 48 Replicate_Do_DB: 49 Replicate_Ignore_DB: 50 Replicate_Do_Table: 51 Replicate_Ignore_Table: 52 Replicate_Wild_Do_Table: 53 Replicate_Wild_Ignore_Table: 54 Last_Errno: 0 55 Last_Error: 56 Skip_Counter: 0 57 Exec_Master_Log_Pos: 194 58 Relay_Log_Space: 695 59 Until_Condition: None 60 Until_Log_File: 61 Until_Log_Pos: 0 62 Master_SSL_Allowed: No 63 Master_SSL_CA_File: 64 Master_SSL_CA_Path: 65 Master_SSL_Cert: 66 Master_SSL_Cipher: 67 Master_SSL_Key: 68 Seconds_Behind_Master: 0 69 Master_SSL_Verify_Server_Cert: No 70 Last_IO_Errno: 0 71 Last_IO_Error: 72 Last_SQL_Errno: 0 73 Last_SQL_Error: 74 Replicate_Ignore_Server_Ids: 75 Master_Server_Id: 1003306 76 Master_UUID: 2a4b3562-2ab6-11e8-be7a-080027de0e0e 77 Master_Info_File: /data/mysql/mysql3306/data/master.info 78 SQL_Delay: 0 79 SQL_Remaining_Delay: NULL 80 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates 81 Master_Retry_Count: 86400 82 Master_Bind: 83 Last_IO_Error_Timestamp: 84 Last_SQL_Error_Timestamp: 85 Master_SSL_Crl: 86 Master_SSL_Crlpath: 87 Retrieved_Gtid_Set: 2a4b3562-2ab6-11e8-be7a-080027de0e0e:12715627 88 Executed_Gtid_Set: 27af30ca-6800-11e8-ad7e-080027de0e0e:1, //This gtid was generated by the statment of "create user ... ". 89 2a4b3562-2ab6-11e8-be7a-080027de0e0e:1-12715627 //The executed gtid was equal with the master. 90 Auto_Position: 1 91 Replicate_Rewrite_DB: 92 Channel_Name: 93 Master_TLS_Version: 94 1 row in set (0.00 sec)
Summary
- Error 1236 is common phenomenon in our replication of MySQL which always due to lack of specific binlog files.
- Before you execute a statement on master.You'd better start your slave replication at first.Obviously it means our test environment.
- If you were not sure which transaction was stuck.Analyze the binlog file is always a good idea to solve these kind of issues.
- In my case,I know clearly about what I've operated in master.So I can easily execute the same operation on slave.But the product environment is complecated and it's not recommended to do so.
上一篇: cut 的用法
下一篇: 看完让你彻底搞懂Websocket原理