欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

MySQL高可用之MGR安装测试(续)

程序员文章站 2022-05-11 12:34:18
Preface I've implemented the Group Replication with three servers yesterday,What a shame it eventually terminated in several glitches.Now that it does ......
 
Preface
 
    I've implemented the Group Replication with three servers yesterday,What a shame it eventually terminated in several glitches.Now that it doesn't work appropriately,I decide to inquire what has happened and solve the problems.Let's see some details below.
 
Procedure
 
    Here's some phenomena in the error.log file of zlm2 when I started the Group Replicaiotn on server zlm2 and subsequently the server zlm3:
 
 1 2018-06-14T05:07:43.286396Z 3 [Note] Plugin group_replication reported: '[GCS] Translated 'zlm2' to 127.0.0.1'
 2 2018-06-14T05:07:43.286688Z 3 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
 3 2018-06-14T05:07:43.303290Z 3 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
 4 2018-06-14T05:07:43.303303Z 3 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
 5 2018-06-14T05:07:43.322700Z 0 [Note] Plugin group_replication reported: 'Successfully bound to 0.0.0.0:33061 (socket=59).'
 6 2018-06-14T05:07:43.322800Z 0 [Note] Plugin group_replication reported: 'Successfully set listen backlog to 32 (socket=59)!'
 7 2018-06-14T05:07:43.323089Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
 8 2018-06-14T05:07:43.323326Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 60'
 9 2018-06-14T05:07:43.324370Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
10 2018-06-14T05:07:43.324594Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 64'
11 2018-06-14T05:07:43.325365Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
12 2018-06-14T05:07:43.325528Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 67'
13 2018-06-14T05:07:43.326770Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
14 2018-06-14T05:07:43.327442Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
15 2018-06-14T05:07:43.327448Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
16 2018-06-14T05:07:43.327991Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
17 2018-06-14T05:07:43.327996Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
18 2018-06-14T05:07:43.328227Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
19 2018-06-14T05:07:43.328231Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
20 2018-06-14T05:07:43.328473Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
21 2018-06-14T05:07:43.328477Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
22 2018-06-14T05:07:43.328692Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
23 2018-06-14T05:07:43.328696Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
24 2018-06-14T05:07:43.328927Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
25 2018-06-14T05:07:43.328931Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
26 2018-06-14T05:07:43.329149Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
27 2018-06-14T05:07:43.329153Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
28 2018-06-14T05:07:43.329371Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
29 2018-06-14T05:07:43.329376Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
30 2018-06-14T05:07:43.329591Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
31 2018-06-14T05:07:43.329595Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
32 2018-06-14T05:07:43.329812Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
33 2018-06-14T05:07:43.329822Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
34 2018-06-14T05:07:43.330046Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
35 2018-06-14T05:07:43.330051Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
36 2018-06-14T05:07:43.330315Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
37 2018-06-14T05:07:43.330319Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
38 2018-06-14T05:07:43.330542Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
39 2018-06-14T05:07:43.330545Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
40 2018-06-14T05:07:43.330846Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
41 2018-06-14T05:07:43.330850Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
42 2018-06-14T05:07:43.331029Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
43 2018-06-14T05:07:43.331034Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
44 2018-06-14T05:07:43.331268Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
45 2018-06-14T05:07:43.331493Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm3:33062 on local port: 33061.'
46 2018-06-14T05:07:43.331496Z 0 [Note] Plugin group_replication reported: 'connecting to zlm4 33063'
47 2018-06-14T05:07:43.331710Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to zlm4:33063 on local port: 33061.'
48 2018-06-14T05:07:43.332195Z 0 [Note] Plugin group_replication reported: 'state 4338 action xa_exit'
49 2018-06-14T05:07:43.332221Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
50 2018-06-14T05:07:43.332223Z 0 [Note] Plugin group_replication reported: 'new state x_start'
51 2018-06-14T05:07:43.341819Z 0 [Warning] Plugin group_replication reported: 'read failed'
52 2018-06-14T05:07:43.341918Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33061'
53 2018-06-14T05:08:43.305045Z 3 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
54 2018-06-14T05:08:43.305086Z 3 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
55 2018-06-14T05:08:43.305104Z 3 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
56 2018-06-14T05:08:43.305389Z 3 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
57 2018-06-14T05:08:43.305400Z 3 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
58 2018-06-14T05:08:43.305657Z 8 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
59 2018-06-14T05:08:43.306129Z 5 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
60 2018-06-14T05:09:34.369943Z 3 [Note] Plugin group_replication reported: '[GCS] Translated 'zlm2' to 127.0.0.1'
61 2018-06-14T05:09:34.370020Z 3 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
62 2018-06-14T05:09:34.395914Z 3 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
63 2018-06-14T05:09:34.395914Z 3 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
64 2018-06-14T05:09:34.395914Z 3 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1013306'
65 2018-06-14T05:09:34.396105Z 0 [Note] Plugin group_replication reported: 'state 4338 action xa_init'
66 2018-06-14T05:09:34.396141Z 0 [Note] Plugin group_replication reported: 'Successfully unblocked socket (socket=58)!'
67 2018-06-14T05:09:34.396160Z 0 [Note] Plugin group_replication reported: 'Ready to accept incoming connections on 0.0.0.0:33061 (socket=58)!'
68 2018-06-14T05:09:34.396174Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
69 2018-06-14T05:09:34.397174Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 62'
70 2018-06-14T05:09:34.397319Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
71 2018-06-14T05:09:34.397405Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 64'
72 2018-06-14T05:09:34.397872Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
73 2018-06-14T05:09:34.397992Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 66'
74 2018-06-14T05:09:34.398354Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
75 2018-06-14T05:09:34.398450Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 68'
76 2018-06-14T05:09:34.398599Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
77 2018-06-14T05:09:34.398689Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 70'
78 2018-06-14T05:09:34.399791Z 0 [Note] Plugin group_replication reported: 'state 4338 action xa_net_boot'
79 2018-06-14T05:09:34.399804Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
80 2018-06-14T05:09:34.400093Z 0 [Note] Plugin group_replication reported: 'new state x_run'
81 2018-06-14T05:09:34.400123Z 0 [Note] Plugin group_replication reported: 'state 4411 action xa_net_boot'
82 2018-06-14T05:09:34.400126Z 0 [Note] Plugin group_replication reported: 'new state x_run'
83 2018-06-14T05:09:34.400134Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
84 2018-06-14T05:09:35.400977Z 17 [Note] Plugin group_replication reported: 'Only one server alive. Declaring this server as online within the replication group'
85 2018-06-14T05:09:35.401085Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306 on view 15289529754008535:1.'
86 2018-06-14T05:09:35.401564Z 0 [Note] Plugin group_replication reported: 'This server was declared online within the replication group'
87 2018-06-14T05:09:35.401607Z 0 [Note] Plugin group_replication reported: 'A new primary with address zlm2:3306 was elected, enabling conflict detection until the new primary applies all relay logs.'
88 2018-06-14T05:09:35.401639Z 19 [Note] Plugin group_replication reported: 'This server is working as primary member.'
89 2018-06-14T05:14:10.239890Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
90 2018-06-14T05:14:15.722205Z 0 [Note] Plugin group_replication reported: 'Members joined the group: zlm3:3306'
91 2018-06-14T05:14:15.722321Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306, zlm3:3306 on view 15289529754008535:2.'
92 2018-06-14T05:14:15.724228Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
93 2018-06-14T05:14:17.673695Z 0 [Warning] Plugin group_replication reported: 'Members removed from the group: zlm3:3306'
94 2018-06-14T05:14:17.673798Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306 on view 15289529754008535:3.'
95 2018-06-14T05:15:28.381832Z 3 [Note] Aborted connection 3 to db: 'performance_schema' user: 'root' host: 'localhost' (Got timeout reading communication packets)

 

    There're some clues in the error.log file on zlm3 in the same time after I've started Group Replication.

 

 1 2018-06-14T05:14:10.257975Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
 2 2018-06-14T05:14:10.257984Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
 3 2018-06-14T05:14:10.257984Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1023306'
 4 2018-06-14T05:14:10.258030Z 0 [Note] Plugin group_replication reported: 'state 0 action xa_init'
 5 2018-06-14T05:14:10.292244Z 0 [Note] Plugin group_replication reported: 'Successfully bound to 0.0.0.0:33062 (socket=58).'
 6 2018-06-14T05:14:10.292272Z 0 [Note] Plugin group_replication reported: 'Successfully set listen backlog to 32 (socket=58)!'
 7 2018-06-14T05:14:10.292276Z 0 [Note] Plugin group_replication reported: 'Successfully unblocked socket (socket=58)!'
 8 2018-06-14T05:14:10.292294Z 0 [Note] Plugin group_replication reported: 'Ready to accept incoming connections on 0.0.0.0:33062 (socket=58)!'
 9 2018-06-14T05:14:10.292339Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
10 2018-06-14T05:14:10.292596Z 0 [Note] Plugin group_replication reported: 'client connected to zlm3 33062 fd 59'
11 2018-06-14T05:14:10.295085Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
12 2018-06-14T05:14:10.295189Z 0 [Note] Plugin group_replication reported: 'client connected to zlm3 33062 fd 52'
13 2018-06-14T05:14:10.295440Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
14 2018-06-14T05:14:10.295509Z 0 [Note] Plugin group_replication reported: 'client connected to zlm3 33062 fd 62'
15 2018-06-14T05:14:10.295590Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
16 2018-06-14T05:14:10.295635Z 0 [Note] Plugin group_replication reported: 'client connected to zlm3 33062 fd 64'
17 2018-06-14T05:14:10.295710Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
18 2018-06-14T05:14:10.295748Z 0 [Note] Plugin group_replication reported: 'client connected to zlm3 33062 fd 66'
19 2018-06-14T05:14:10.295817Z 0 [Note] Plugin group_replication reported: 'connecting to zlm3 33062'
20 2018-06-14T05:14:10.295861Z 0 [Note] Plugin group_replication reported: 'client connected to zlm3 33062 fd 68'
21 2018-06-14T05:14:10.296880Z 0 [Note] Plugin group_replication reported: 'connecting to zlm2 33061'
22 2018-06-14T05:14:10.297876Z 0 [Note] Plugin group_replication reported: 'client connected to zlm2 33061 fd 70'
23 2018-06-14T05:14:11.305010Z 0 [Note] Plugin group_replication reported: 'state 4338 action xa_snapshot'
24 2018-06-14T05:14:11.305274Z 0 [Note] Plugin group_replication reported: 'new state x_recover'
25 2018-06-14T05:14:11.305281Z 0 [Note] Plugin group_replication reported: 'state 4358 action xa_complete'
26 2018-06-14T05:14:11.305381Z 0 [Note] Plugin group_replication reported: 'new state x_run'
27 2018-06-14T05:14:15.781232Z 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 5c77c31b-4add-11e8-81e2-080027de0e0e:1-16 > Group transactions: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-2,
28 ed142e35-6ed1-11e8-86c6-080027de0e0e:1-7'
29 2018-06-14T05:14:15.781267Z 0 [ERROR] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.'
30 2018-06-14T05:14:15.781270Z 0 [Note] Plugin group_replication reported: 'To force this member into the group you can use the group_replication_allow_local_disjoint_gtids_join option'
31 2018-06-14T05:14:15.781299Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306,  on view 15289529754008535:2.'
32 2018-06-14T05:14:15.781325Z 2 [Note] Plugin group_replication reported: 'Going to wait for view modification'
33 2018-06-14T05:14:15.782782Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
34 2018-06-14T05:14:20.734727Z 0 [Note] Plugin group_replication reported: 'state 4411 action xa_terminate'
35 2018-06-14T05:14:20.734884Z 0 [Note] Plugin group_replication reported: 'new state x_start'
36 2018-06-14T05:14:20.734891Z 0 [Note] Plugin group_replication reported: 'state 4338 action xa_exit'
37 2018-06-14T05:14:20.734922Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
38 2018-06-14T05:14:20.734926Z 0 [Note] Plugin group_replication reported: 'new state x_start'
39 2018-06-14T05:14:20.743941Z 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
40 2018-06-14T05:14:25.745237Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
41 2018-06-14T05:14:25.745284Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
42 2018-06-14T05:14:25.745592Z 7 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
43 2018-06-14T05:14:25.745817Z 4 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
44 2018-06-14T05:19:26.833707Z 2 [Note] Aborted connection 2 to db: 'unconnected' user: 'root' host: 'localhost' (Got timeout reading communication packets)

 

    In my test environment,the database on server zlm3 is different from zlm2,as it is the master in my master-slave replication.Howerver the server zlm2 is a newly initialized one which has no old data.they have their own data respectively.
 
    The error.log indicates the reason why zlm3 cannot join the group of Group Replication.The workaround is shown above by setting "group_replication_allow_local_disjoint_gtids_join" parameter to force zlm3 join the group.Let's have a try and see whether it makes sence.

 

1 (root@localhost mysql3306.sock)[(none)]08:15:41>set global group_replication_allow_local_disjoint_gtids_join=on;
2 Query OK, 0 rows affected, 1 warning (0.00 sec)
3 
4 (root@localhost mysql3306.sock)[(none)]08:15:53>start group_replication;
5 Query OK, 0 rows affected, 1 warning (4.90 sec)
6 
7 (root@localhost mysql3306.sock)[(none)]08:16:11>

 

    It seems server zlm3 has joined the group without error output this time.

 

 1 2018-06-14T06:16:10.720574Z 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 5c77c31b-4add-11e8-81e2-080027de0e0e:1-16 > Group transactions: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-2,
 2 ed142e35-6ed1-11e8-86c6-080027de0e0e:1-8'
 3 2018-06-14T06:16:10.720600Z 0 [Warning] Plugin group_replication reported: 'The member contains transactions not present in the group. It is only allowed to join due to group_replication_allow_local_disjoint_gtids_join option'
 4 2018-06-14T06:16:10.720854Z 11 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address zlm2:3306.'
 5 2018-06-14T06:16:10.721915Z 19 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
 6 2018-06-14T06:16:10.722030Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306, zlm3:3306 on view 15289529754008535:4.'
 7 2018-06-14T06:16:10.732273Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='zlm2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
 8 2018-06-14T06:16:10.752644Z 19 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 1b7181ee-6eaf-11e8-998e-080027de0e0e at zlm2 port: 3306.'
 9 2018-06-14T06:16:10.752644Z 21 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
10 2018-06-14T06:16:10.752875Z 22 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-bin-group_replication_recovery.000001' position: 4
11 2018-06-14T06:16:10.754070Z 21 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_mgr@zlm2:3306',replication started in log 'FIRST' at position 4
12 2018-06-14T06:16:10.759820Z 21 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
13 2018-06-14T06:16:10.759854Z 21 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
14 2018-06-14T06:16:10.759860Z 21 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
15 2018-06-14T06:16:10.760247Z 19 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
16 2018-06-14T06:16:10.760289Z 22 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
17 2018-06-14T06:16:10.772161Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='zlm2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
18 2018-06-14T06:16:10.792308Z 19 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
19 2018-06-14T06:17:10.803085Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='zlm2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
20 2018-06-14T06:17:10.822823Z 19 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 1b7181ee-6eaf-11e8-998e-080027de0e0e at zlm2 port: 3306.'
21 2018-06-14T06:17:10.822823Z 23 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
22 2018-06-14T06:17:10.823067Z 24 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-bin-group_replication_recovery.000001' position: 4
23 2018-06-14T06:17:10.824189Z 23 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_mgr@zlm2:3306',replication started in log 'FIRST' at position 4
24 2018-06-14T06:17:10.829053Z 23 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
25 2018-06-14T06:17:10.829085Z 23 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
26 2018-06-14T06:17:10.829206Z 23 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
27 2018-06-14T06:17:10.829575Z 19 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
28 2018-06-14T06:17:10.829624Z 24 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
29 2018-06-14T06:17:10.842611Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='zlm2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
30 2018-06-14T06:17:10.858279Z 19 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 3/10'

 

    Check the "replication_group_member" table which shows the server zlm3 is in a state of recovering.

 

1 (root@localhost mysql3306.sock)[performance_schema]08:20:20>SELECT * FROM performance_schema.replication_group_members;
2 +---------------------------+--------------------------------------+-------------+-------------+--------------+
3 | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
4 +---------------------------+--------------------------------------+-------------+-------------+--------------+
5 | group_replication_applier | 1b7181ee-6eaf-11e8-998e-080027de0e0e | zlm2        |        3306 | ONLINE       |
6 | group_replication_applier | 5c77c31b-4add-11e8-81e2-080027de0e0e | zlm3        |        3306 | RECOVERING   |
7 +---------------------------+--------------------------------------+-------------+-------------+--------------+
8 2 rows in set (0.00 sec)

 

    The error.log on zlm2 shows that the server zlm3 has joined the group who has two members now.

 

1 2018-06-14T06:16:06.763739Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
2 2018-06-14T06:16:10.638983Z 0 [Note] Plugin group_replication reported: 'Members joined the group: zlm3:3306'
3 2018-06-14T06:16:10.639079Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306, zlm3:3306 on view 15289529754008535:4.'

 

    But a few minutes later,zlm3 left the group again what we can see below on zlm2.

 

 1 (root@localhost mysql3306.sock)[performance_schema]08:20:23>SELECT * FROM performance_schema.replication_group_members;
 2 ERROR 2006 (HY000): MySQL server has gone away
 3 No connection. Trying to reconnect...
 4 Connection id:    39
 5 Current database: performance_schema
 6 
 7 +---------------------------+--------------------------------------+-------------+-------------+--------------+
 8 | CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
 9 +---------------------------+--------------------------------------+-------------+-------------+--------------+
10 | group_replication_applier | 1b7181ee-6eaf-11e8-998e-080027de0e0e | zlm2        |        3306 | ONLINE       |
11 +---------------------------+--------------------------------------+-------------+-------------+--------------+
12 1 row in set (0.17 sec)
13 
14 ###Error.log output.###
15 2018-06-14T06:25:11.375254Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
16 2018-06-14T06:25:12.439791Z 0 [Warning] Plugin group_replication reported: 'Members removed from the group: zlm3:3306'
17 2018-06-14T06:25:12.439884Z 0 [Note] Plugin group_replication reported: 'Group membership changed to zlm2:3306 on view 15289529754008535:5.'
18 2018-06-14T06:25:23.556717Z 30 [Note] Aborted connection 30 to db: 'performance_schema' user: 'root' host: 'localhost' (Got timeout reading communication packets)

 

    After server zlm3 have attempted 10 times,the SQL Thread of slave is killed again before the slave leave the group.

 

 1 2018-06-14T06:24:11.369677Z 19 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 10/10'
 2 2018-06-14T06:25:11.386015Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='zlm2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
 3 2018-06-14T06:25:11.405855Z 19 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 1b7181ee-6eaf-11e8-998e-080027de0e0e at zlm2 port: 3306.'
 4 2018-06-14T06:25:11.405856Z 39 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
 5 2018-06-14T06:25:11.405900Z 40 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './relay-bin-group_replication_recovery.000001' position: 4
 6 2018-06-14T06:25:11.407712Z 39 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_mgr@zlm2:3306',replication started in log 'FIRST' at position 4
 7 2018-06-14T06:25:11.412106Z 39 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
 8 2018-06-14T06:25:11.412147Z 39 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
 9 2018-06-14T06:25:11.412150Z 39 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
10 2018-06-14T06:25:11.412498Z 19 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
11 2018-06-14T06:25:11.412542Z 40 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
12 2018-06-14T06:25:11.424261Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='zlm2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
13 2018-06-14T06:25:11.437615Z 19 [ERROR] Plugin group_replication reported: 'Maximum number of retries when trying to connect to a donor reached. Aborting group replication recovery.'
14 2018-06-14T06:25:11.437632Z 19 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
15 2018-06-14T06:25:11.444043Z 19 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
16 2018-06-14T06:25:11.457830Z 19 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.'
17 2018-06-14T06:25:11.458777Z 0 [Note] Plugin group_replication reported: 'getstart group_id c3244967'
18 2018-06-14T06:25:14.879847Z 0 [Note] Plugin group_replication reported: 'state 4411 action xa_terminate'
19 2018-06-14T06:25:14.879987Z 0 [Note] Plugin group_replication reported: 'new state x_start'
20 2018-06-14T06:25:14.879993Z 0 [Note] Plugin group_replication reported: 'state 4338 action xa_exit'
21 2018-06-14T06:25:14.880019Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
22 2018-06-14T06:25:14.880027Z 0 [Note] Plugin group_replication reported: 'new state x_start'
23 2018-06-14T06:25:14.898475Z 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

 

    Finally,I figure out the parameter "log-bin" was set to "binlog",but the really name of binlog in my environment is "mysql-bin".Oh my gosh,I'm too careless not to find it out early.These three servers in Group Replication work normally now.

 

###On the server zlm2.###
(root@localhost mysql3306.sock)[performance_schema]10:14:32>show variables like '%primary%';
+---------------------------------------+-------+
| Variable_name | Value |
+---------------------------------------+-------+
| group_replication_single_primary_mode | ON |
+---------------------------------------+-------+
1 row in set (0.01 sec)

(root@localhost mysql3306.sock)[(none)]10:14:40>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 1b7181ee-6eaf-11e8-998e-080027de0e0e | zlm2 | 3306 | ONLINE |
| group_replication_applier | 5c77c31b-4add-11e8-81e2-080027de0e0e | zlm3 | 3306 | ONLINE |
| group_replication_applier | fc288b24-6934-11e8-9b0e-080027de0e0e | zlm4 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

(root@localhost mysql3306.sock)[(none)]10:15:15>show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 1b7181ee-6eaf-11e8-998e-080027de0e0e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

(root@localhost mysql3306.sock)[zlm]10:23:34>show tables;
+---------------+
| Tables_in_zlm |
+---------------+
| test_mgr |
+---------------+
1 row in set (0.00 sec)

(root@localhost mysql3306.sock)[zlm]10:23:57>select * from test_mgr;
+----+-----------+
| id | name |
+----+-----------+
| 1 | aaron8219 |
+----+-----------+
1 row in set (0.00 sec)

(root@localhost mysql3306.sock)[zlm]10:24:12>

###On the server zlm3.###
(root@localhost mysql3306.sock)[zlm]10:23:45>show tables;
+---------------+
| Tables_in_zlm |
+---------------+
| t1 |
| t2 |
| t3 |
| test_mgr |
+---------------+
4 rows in set (0.00 sec)

(root@localhost mysql3306.sock)[zlm]10:24:26>select * from test_mgr;
+----+-----------+
| id | name |
+----+-----------+
| 1 | aaron8219 |
+----+-----------+
1 row in set (0.01 sec)

(root@localhost mysql3306.sock)[zlm]10:24:28>insert into test_mgr(name) values('zlm');
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
(root@localhost mysql3306.sock)[zlm]10:24:36>

###On the server zlm4.###
+---------------+
| Tables_in_zlm |
+---------------+
| t1 |
| t2 |
| t3 |
| test_mgr |
+---------------+
4 rows in set (0.01 sec)

(root@localhost mysql.sock)[zlm]10:28:14>select * from test_mgr;
+----+-----------+
| id | name |
+----+-----------+
| 1 | aaron8219 |
+----+-----------+
1 row in set (0.00 sec)

(root@localhost mysql.sock)[zlm]10:28:20>insert into test_mgr(name) values('zlm');
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement
(root@localhost mysql3306.sock)[zlm]10:28:58>

 

Summary
  • Single-primary mode is supported by default like what I've configured above.
  • The first member who creates replication group by starting it is the master who can merely do dml operations.
  • Parameter "group_replication_allow_local_disjoint_gtids_join" will lead to inconsistency and be deprecated in 5.7.21.
  • Becareful when using the parameters in examples of official documents which are probably imcompatible with your environment,that can cause complex troubleshooting procedure like what I did.