如何正确衡量主从延迟时间(关于Seconds_Behind_Master和pt-heartbeat)
背景
主库的worker线程在写binlog的时候是并发工作的,而主库的dump线程和从库的IO线程都是单线程推拉binlog、特别是默认SQL线程是拿着relay log中的event逐一单线程回放的(5.6版本开启slave_parallel_workers支持特定情况下的并行复制,5.7版本之后全面支持并行复制后在复制层面已极大改善了延时问题)。因此即使不考虑网络延迟,主流MySQL版本在高并发的情况下,消费很可能赶不上生产,采用异步复制的从库很有可能跟不上主库的进度。
复制原理的回顾
(图摘自爱可生---张沈波的PPT)
1)主库 Binlog Dump线程在binlog有变化时,主动发送最新的binlog到从库。
2)从库 IO线程被动接收主库传来的binlog之后,记录到从库的relay log中,当没有数据传入的时候则会等待。与此同时SQL线程重放 relay log。
3)当从库长时间未收到主库传来的数据,并且等待时间超过了slave_net_timeout定义的时间(默认3600秒)后,Slave_IO_Running的状态将会置为No。在此之后,每隔master-connect-retry定义的时间(默认60秒)将会尝试重新连接,直到连接成功或超过重试次数master-retry-count(默认86400次)。
注:slave_net_timeout可以在配置文件中修改或set variable在线设置;而 --master-connect-retry、--master-retry-count 需要在change master to建立复制关系时提前指定。
可以再次看出在复制期间,无论是主库或从库负载高(特别是从库落盘压力大,关系到sync_binlog、innodb_flush_log_at_trx_commit的设置)或者是网络传输慢(特别是跨机房的同步)等情况发生时,都会产生主从延迟,并且是不可避免的。如果要实现强一致性,可采用Semi-sync,但采用该plugin也无法保证持续强一致性(rpl_semi_sync_master_timeout会引起复制模式的降级)
显然,延迟时间就变为一个重要的监控点了。
Seconds_Behind_Master计算主从延时
先看下官方手册的释疑:
When the slave is actively processing updates, this field shows the difference between the current timestamp on the slave and the original timestamp logged on the master for the event currently being processed on the slave.
When no event is currently being processed on the slave, this value is 0.
In MySQL 5.6.9 and later, this field is NULL (undefined or unknown) if the slave SQL thread is not running, or if the SQL thread has consumed all of the relay log and the slave I/O thread is not running
简言之,计算方法为:
1、当SQL线程执行event时,从库执行时刻的timestamp值 减去 该event上附带的时间戳(当时主库上的timestamp),这两者的差值。
2、一旦SQL线程未在执行event,则SBM为0
3、IO线程或SQL线程 is not running,则SBM为NULL
假如有以下3种情况发生,虽然Seconds_Behind_Master仍然存在非NULL的值,但已经变得不准确
1、主从时间不一致(虽然引入了clock_diff_with_master,尽量调节时间差带来的影响,但该值仅在从库与主库建立连接之时获取,后续不再更新,若之后再去修改主从机的时间,该值就不可靠了)。
2、主从库间网络问题或者从库IO线程未发现主库的binlog dump 线程挂了,仍然在等待数据传输过来,SBM长时间持续为零。
3、主库有较大的binlog event执行前后,从库上看到的SBM将会有大的跳动(监控图中将很可能产生毛刺)
4、对于并行复制,SMB是基于Exec_Master_Log_Pos,不精准。
注:1的参数在源码中的体现见参考文档1,2和3的相关实验见参考文档2。
其实,手册中都有相关注明如下:
1、This time difference computation works even if the master and slave do not have identical clock times, provided that the difference, computed when the slave I/O thread starts, remains constant from then on. Any changes—including NTP updates—can lead to clock skews that can make calculation of Seconds_Behind_Master less reliable.
2、A value of 0 for Seconds_Behind_Master can usually be interpreted as meaning that the slave has caught up with the master, but there are some cases where this is not strictly true. For example, this can occur if the network connection between master and slave is broken but the slave I/O thread has not yet noticed this—that is, slave_net_timeout has not yet elapsed.
3、It is also possible that transient values for Seconds_Behind_Master may not reflect the situation accurately. When the slave SQL thread has caught up on I/O, Seconds_Behind_Master displays 0; but when the slave I/O thread is still queuing up a new event, Seconds_Behind_Master may show a large value until the SQL thread finishes executing the new event. This is especially likely when the events have old timestamps; in such cases, if you execute SHOW SLAVE STATUS several times in a relatively short period, you may see this value change back and forth repeatedly between 0 and a relatively large value.
4、When using a multithreaded slave, you should keep in mind that this value is based on Exec_Master_Log_Pos, and so may not reflect the position of the most recently committed transaction.
以event数为单位计算主从延时
下面以一个例子来说明:
step1、主库执行show binary logs;查看主库最新的binlog位置点
+------------------+--------------+
| Log_name | File_size |
+------------------+--------------+
| mysql-bin.000053 | 1002420123 |
| mysql-bin.000054 | 1212937421 |
+------------------+--------------+
step2、同一时刻,也在从库上执行show slave status \G
Master_Log_File: mysql-bin.000053
Read_Master_Log_Pos: 671469325
Relay_Master_Log_File: mysql-bin.000053
Exec_Master_Log_Pos: 519854012
Seconds_Behind_Master: 3584
step3、计算该时刻的延迟情况
1)发现主库当前binlog序号和从库读取的binlog序号不同,延迟比较大了;
2)从库上,IO线程拿到的主库的binlog最新的偏移量(Read_Master_Log_Pos)与SQL线程正在replay的位置点(偏移量)Exec_Master_Log_Pos 也有较大延迟;
3)所以,主从延迟的总量是当前序号的binlog的延迟量和还没来得及收的新增序号的binlog的总和。即:(1002420123 - 519854012)+ 1212937421
以时间为单位计算主从延时
第三方工具mk-heartbeat、pt-heartbeat可以帮到你。以下以pt-heartbeat为例:
大致工作流程:
使用pt-heartbeat在主库上维护了一个后台进程,定时更新系统timestamp到heartbeat表中;
使用 --monitor 或 --check 参数连接从库,比较从库当前timestamp值和复制过去的heartbeat表中timestamp值,计算出差值
示例:
step1、主库创建一个库,用于存放heartbeat表,该库将在下一步操作中指定为 -D 的参数
step2、主库开启守护进程,更新被监控库中的heartbeat表:
pt-heartbeat -D heartbeat --update -S[master_server_socket] -u[username] -p[user_passwd] --create-table --daemonize
进入被监控库,可以发现新增了一个heartbeat表,定期update(默认1S)最新的timestamp信息到ts字段中
step3、在从库上,执行
pt-heartbeat -D heartbeat --monitor -h[slave ip] -u[username] -p[user_passwd]
pt-heartbeat -D heartbeat --check -h[slave ip] -u[username] -p[user_passwd]
--monitor:持续监控
--check:仅做一次探测
--master-server-id=XXX:如果主库的server id无法识别的话,需要手动指定
step4、简单测试
从库stop slave后执行 pt-heartbeat --monitor,反复start slave、stop slave观察结果
注:使用 pt-heartbeat --stop 关闭后台进程。binlog中不再因此新产生SET TIMESTAMP记录,但heartbeat表将不会被删除。下次再开启该后台进行时,命令中不用添加 --create-table 选项了。
需要注意,根据“Remove this file to permit pt-heartbeat to run”的提示,后续再次开启后台进程需要先删除该文件
更进一步,在Zabbix中添加相关监控
设置item注意:
1、shell可以采用:pt-heartbeat -D demo --check -h[slave ip] -u[username] -p[user_passwd]
2、Type of information:numeric(float)。
3、按需修改更新间隔:Update interval
为了易于演示快速出告警,trigger中阈值定的很低(主从延迟大于2秒就告警)
(1)演示一、
尝试stop slave一段时间后,再次start slave,如此反复两次观察对应图形变化情况
第二次start slave时:
(2)演示二、更新一个利用pt-slave-delay工具手工设置主从延迟时间,观察Zabbix出图情况:
[aaa@qq.com_13 ~]# pt-slave-delay --user=root --password=123456 --delay 1m --run-time 10m --host=192.168.237.13
2017-08-04T09:44:24 slave running 0 seconds behind
2017-08-04T09:44:24 STOP SLAVE until 2017-08-04T09:45:24 at master position mysql-bin.000031/10650583
2017-08-04T09:45:24 no new binlog events
2017-08-04T09:46:24 START SLAVE until master 2017-08-04T09:45:24 mysql-bin.000031/10668583
2017-08-04T09:47:24 START SLAVE until master 2017-08-04T09:46:24 mysql-bin.000031/10686583
2017-08-04T09:48:24 START SLAVE until master 2017-08-04T09:47:24 mysql-bin.000031/10704583
2017-08-04T09:49:24 START SLAVE until master 2017-08-04T09:48:24 mysql-bin.000031/10722583
2017-08-04T09:50:24 START SLAVE until master 2017-08-04T09:49:24 mysql-bin.000031/10740583
2017-08-04T09:51:24 START SLAVE until master 2017-08-04T09:50:24 mysql-bin.000031/10758583
2017-08-04T09:52:24 START SLAVE until master 2017-08-04T09:51:24 mysql-bin.000031/10776583
2017-08-04T09:53:24 START SLAVE until master 2017-08-04T09:52:24 mysql-bin.000031/10794583
2017-08-04T09:54:24 START SLAVE until master 2017-08-04T09:53:24 mysql-bin.000031/10812583
2017-08-04T09:54:24 Setting slave to run normally
[aaa@qq.com_13 ~]#
参考文档:
2、MySQL同步状态双Yes的假象及seconds_behind_master的含义
3、[MySQL FAQ]系列 — MySQL复制中slave延迟监控