【MySQL】你以为设置了并行复制就降低延迟了?这个你绝对想不到!
在mysql官方版本中,为了保证其的高可用性,一般情况我们会采用主从复制的方式来解决。当然,方法很多。而我们今天所要处理的是采用gtid方式并且开了多线程复制后,仍然延迟的情况,糟糕的是,延迟还在不断扩大!
序号 | 清单 | 说明 |
1 | 系统 | redhat 6.x(4c,32g) |
2 | 数据库 | mysql-5.7.25 |
3 | 同步方式 | 基于gtid主从同步 |
1)已经配置的重要参数:
slave_parallel_type = logical_clockrelay for slave
slave_parallel_workers = 6
master_info_repository = table
relay_log_info_repository = table
relay_log_recovery = on
sync_relay_log = 10000
注:此时没有设置slave_preserve_commit_order
参数。
2)从库延迟状态查询
master_log_file relay_log_file relay_master_log_file seconds_behind_master
mysql> show slave status\g
*************************** 1. row ***************************
slave_io_state: queueing master event to the relay log
master_host: xxx.xxx.xxx.xxx
master_user: repl
master_port: 3306
connect_retry: 60
master_log_file: mysql-bin.008978
read_master_log_pos: 696914605
relay_log_file: db41-relay-bin.001259
relay_log_pos: 207377582
relay_master_log_file: mysql-bin.008970
slave_io_running: yes
slave_sql_running: yes
replicate_do_db: neteagle3
replicate_ignore_db: mysql
replicate_do_table:
replicate_ignore_table:
replicate_wild_do_table:
replicate_wild_ignore_table:
last_errno: 0
last_error:
skip_counter: 0
exec_master_log_pos: 1068770059
relay_log_space: 8425484286
until_condition: none
until_log_file:
until_log_pos: 0
master_ssl_allowed: no
master_ssl_ca_file:
master_ssl_ca_path:
master_ssl_cert:
master_ssl_cipher:
master_ssl_key:
seconds_behind_master: 187358
master_ssl_verify_server_cert: no
last_io_errno: 0
last_io_error:
last_sql_errno: 0
last_sql_error:
replicate_ignore_server_ids:
master_server_id: 42
master_uuid: eab7fcac-3cda-11e6-ada8-fa163e648db2
master_info_file: mysql.slave_master_info
sql_delay: 0
sql_remaining_delay: null
slave_sql_running_state: waiting for dependent transaction to commit
master_retry_count: 86400
master_bind:
last_io_error_timestamp:
last_sql_error_timestamp:
master_ssl_crl:
master_ssl_crlpath:
retrieved_gtid_set: eab7fcac-3cda-11e6-ada8-fa163e648db2:58031191-59927276
executed_gtid_set: eab7fcac-3cda-11e6-ada8-fa163e648db2:1-58080239:58080241
auto_position: 1
replicate_rewrite_db:
channel_name:
master_tls_version:
1 row in set (0.00 sec)
简单介绍一下几个指标信息:
,
read_master_log_pos
:这两个参数是成对的,表示的是从库io thread传输主库的binlog日志号及具体位置。
,
relay_log_pos
:这两个参数也是成对的,表示的是从库sql thread应用中继日志(relay log)号及具体位置。
,
relay_log_pos
:这两个参数也是成对的,表示的是上一项中的中继日志对应的主库binlog日志及具体位置(有点绕)。
:此参数可简单理解为主从延迟时间,单位为秒。
从上面这段mysql从状态信息中,我们可以看到,seconds_behind_master: 187358
这是从库sql应用延迟主库的时间为187358秒,转换成天,大概两天多。这说明,我们从库复制的数据是两天前的。
3)验证并行复制
mysql> show full processlist;
+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+
| id | user | host | db | command | time | state | info |
+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+
| 1 | system user | | null | connect | 18204 | waiting for master to send event | null |
| 2 | system user | | null | connect | 0 | waiting for dependent transaction to commit | null |
| 3 | system user | | null | connect | 154914 | system lock | null |
| 4 | system user | | null | connect | 154914 | waiting for an event from coordinator | null |
| 5 | system user | | null | connect | 154918 | waiting for an event from coordinator | null |
| 6 | system user | | null | connect | 155525 | waiting for an event from coordinator | null |
| 7 | system user | | null | connect | 180427 | waiting for an event from coordinator | null |
| 8 | system user | | null | connect | 18204 | waiting for an event from coordinator | null |
| 10 | root | localhost | neteagle3 | query | 0 | starting | show full processlist |
| 11 | repl | dbslave:9683 | null | binlog dump gtid | 18156 | master has sent all binlog to slave; waiting for more updates | null |
| 13 | root | localhost | neteagle3 | sleep | 4962 | | null |
+----+-------------+--------------+-----------+------------------+--------+---------------------------------------------------------------+-----------------------+
mysql> select * from performance_schema.replication_applier_status_by_worker ;
+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+
| channel_name | worker_id | thread_id | service_state | last_seen_transaction | last_error_number | last_error_message | last_error_timestamp |
+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+
| | 1 | 51 | on | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240805 | 0 | | 0000-00-00 00:00:00 |
| | 2 | 52 | on | eab7fcac-3cda-11e6-ada8-fa163e648db2:80240210 | 0 | | 0000-00-00 00:00:00 |
| | 3 | 53 | on | eab7fcac-3cda-11e6-ada8-fa163e648db2:80235089 | 0 | | 0000-00-00 00:00:00 |
| | 4 | 54 | on | eab7fcac-3cda-11e6-ada8-fa163e648db2:80191268 | 0 | | 0000-00-00 00:00:00 |
| | 5 | 55 | on | eab7fcac-3cda-11e6-ada8-fa163e648db2:75296683 | 0 | | 0000-00-00 00:00:00 |
| | 6 | 56 | on | | 0 | | 0000-00-00 00:00:00 |
+--------------+-----------+-----------+---------------+-----------------------------------------------+-------------------+--------------------+----------------------+
6 rows in set (0.00 sec)
通过本条查询,可以看到开启了6个并行进行复制。
从库也同样查看了是否存在锁的情况,也没有发现。 因此反复的在查看slave状态,看能否发现一些细节,结果还真看到了一些异常现象。
通过以上来看,所有一切似乎都正常,并行复制开了,并且查看到cpu,io,内存均没有达到瓶颈地步。主库写binlog日志大概为2mb/s,这样的日质量并不是非常高。relay_log_pos
这个参数在频繁的刷slave状态时,发现时常会卡着不动(此时已确认没有看到锁)。或许问题真正的原因正在这里!
/!50530 set @@session.pseudo_slave_mode=1/;
/!50718 set transaction isolation level read committed//!/; set timestamp=1569201868/!/;
继续看在卡住时刻,数据库open的是什么表? 没错,就是表的存储引擎
分析binlog或者relay log日志,看有啥线索:
[mysql@xxx data]$ mysqlbinlog --no-defaults -v -v --base64-output=decode-rows relay-bin.001384 --start-position=420090430|more
/!50003 set @old_completion_type=@@completion_type,completion_type=0/;
delimiter /!/;
# at 420090430190923 9:24:28 server id 42 end_log_pos 420090282 crc32 0xd9097eaf gtid last_committed=57148 sequence_number=57149 rbr_only=yes
set @@session.gtid_next= 'eab7fcac-3cda-11e6-ada8-fa163e648db2:69415610'/!/;at 420090495
190923 9:24:28 server id 42 end_log_pos 420090364 crc32 0x82b57dfd query thread_id=95 exec_time=0 error_code=0
set @@session.pseudo_thread_id=95/!/;
set @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/!/;
set @@session.sql_mode=1075838976/!/;
set @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/!/;
/!\c gbk //!/;
set @@session.character_set_client=28,@@session.collation_connection=28,@@session.collation_server=8/!/;
set @@session.lc_time_names=0/!/;
set @@session.collation_database=default/!/;
begin
/!/;at 420090577
190923 9:24:28 server id 42 end_log_pos 420090585 crc32 0x752e27cf table_map:
net
.f_event
mapped to number 108at 420090798
190923 9:24:28 server id 42 end_log_pos 420090812 crc32 0x72b8e10d table_map:
net
.f_eventstorage
mapped to number 245at 420091025
190923 9:24:28 server id 42 end_log_pos 420091039 crc32 0x1797f9d8 table_map:
net
.f_eventstorage
mapped to number 245at 420091252
190923 9:24:28 server id 42 end_log_pos 420091106 crc32 0x8af14ad2 table_map:
net
.f_eventdetail
mapped to number 243at 420091319
190923 9:24:28 server id 42 end_log_pos 420091177 crc32 0xf1ce87c8 table_map:
net
.f_eventoperation
mapped to number 244at 420091390
190923 9:24:28 server id 42 end_log_pos 420091244 crc32 0x586c0b9d table_map:
net
.f_eventaudit
mapped to number 242at 420091457
190923 9:24:28 server id 42 end_log_pos 420093382 crc32 0x505e5408 update_rows: table id 108
at 420093595
190923 9:24:28 server id 42 end_log_pos 420098858 crc32 0x0f404509 update_rows: table id 245
at 420099071
190923 9:24:28 server id 42 end_log_pos 420098910 crc32 0xb8d9ed15 write_rows: table id 243
at 420099123
190923 9:24:28 server id 42 end_log_pos 420098966 crc32 0x3c489a7f write_rows: table id 244 flags: stmt_end_f
我们查看了中继日志relay-bin.001384
卡住的位置号420090430,为设置gtid_next,信息没什么用。
mysql> show open tables where in_use=1;
+-----------+---------------------+--------+-------------+
| database | table | in_use | name_locked |
+-----------+---------------------+--------+-------------+
| net | f_currentxxx | 1 | 0 |
+-----------+---------------------+--------+-------------+
这张表有什么特殊的么?查看其表结构
mysql> show create table net.f_currentxxx\g
*************************** 1. row ***************************
table: f_currentxxx
create table: create table f_currentxxx
(serial
int(20) not null comment 'xxx',audittime
bigint(20) not null comment 'xxx',type
int(11) default null comment 'xxx',severity
int(11) default null comment 'xxx,
key audittime
(audittime
)
) engine=memory default charset=gbk comment='xxx'
1 row in set (0.00 sec)
有没有看到什么与众不同?engine=memory
。memory的表进行主从复制,首先来说如果从库不做查询,一点意义没有,另外对memory表做复制,性能是非常堪忧的。如果必须进行同步,考虑将表的存储引擎改为innodb
mysql> select table_name from information_schema.tables where table_schema='net' and engine='memory';
+----------------------+
| table_name |
+----------------------+
| f_currentxxx |
+----------------------+
1 row in set, 6 warnings (0.01 sec)
出于严谨,我们将要复制的数据库进行彻底排查,确实只有这一张表是memory存储引擎。
我们可以看到net.f_currentxxx表已经被忽略复制。持续观察一段时间后,seconds_behind_master在逐渐缩小。
停止复制进程,将选项中添加replicate-ignore-table=net.f_currentxxx
,重新启动复制进程,观察slave状态。
mysql> show slave status\g
*************************** 1. row ***************************
slave_io_state: waiting for master to send event
master_host: xxx.xxx.xxx.xxx
master_user: repl
master_port: 3306
connect_retry: 60
master_log_file: mysql-bin.009194
read_master_log_pos: 939698255
relay_log_file: relay-bin.001964
relay_log_pos: 444060572
relay_master_log_file: mysql-bin.009027
slave_io_running: yes
slave_sql_running: yes
replicate_do_db: net
replicate_ignore_db: mysql
replicate_do_table:
replicate_ignore_table: net.f_currentxxx
replicate_wild_do_table:
replicate_wild_ignore_table:
last_errno: 0
last_error:
skip_counter: 0
exec_master_log_pos: 444060359
relay_log_space: 180287882098
until_condition: none
until_log_file:
until_log_pos: 0
master_ssl_allowed: no
master_ssl_ca_file:
master_ssl_ca_path:
master_ssl_cert:
master_ssl_cipher:
master_ssl_key:
seconds_behind_master: 179221
master_ssl_verify_server_cert: no
last_io_errno: 0
last_io_error:
last_sql_errno: 0
last_sql_error:
replicate_ignore_server_ids:
master_server_id: 42
master_uuid: eab7fcac-3cda-11e6-ada8-fa163e648db2
master_info_file: mysql.slave_master_info
sql_delay: 0
sql_remaining_delay: null
slave_sql_running_state: waiting for dependent transaction to commit
master_retry_count: 86400
master_bind:
last_io_error_timestamp:
last_sql_error_timestamp:
master_ssl_crl:
master_ssl_crlpath:
retrieved_gtid_set: eab7fcac-3cda-11e6-ada8-fa163e648db2:69497322-107886661
executed_gtid_set: 1264a536-da12-11e9-81ea-005056856ba5:1,
eab7fcac-3cda-11e6-ada8-fa163e648db2:1-71980857
auto_position: 1
replicate_rewrite_db:
channel_name:
master_tls_version:
1 row in set (0.00 sec)
中继日志应用速度大约5分钟一个(每个中继日志为1gb大小),而主库binlog日志大约为10分钟一个(每个binlog日志为1gb大小)。
同时,知识的储备也要充足,这是进阶高手的必要前提!
在梳理了整个处理过程后,其实难度不高,主要是要细心,细心去排查每一个想到的点。 在非轻量级的数据库中问题发生的概率也会随着量级的增加而增多。而这恰恰是能够磨练个人的成长。
上一篇: c语言回文字判断