mysqlmha高可用架构的安装_MySQL
程序员文章站
2022-05-24 20:42:46
...
MMM无法完全地保证数据的一致性,所以MMM适用于对数据的一致性要求不是很高,但是又想最大程度的保证业务可用性的场景对于那些对数据一致性要求很高的业务,非常不建议采用MMM的这种高可用性架构,那么可以考虑使用MHA。在mysql故障切换的过程中,MHA能够在0-30s内自动完成数据库的故障切换操作,并且MHA能够最大程度上保证数据的一致性,以达到真正意义上的高可用。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
注意:
(1)为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.5的半同步复制
1.1、搭建环境
用途 主机名 ip server_id 类型
master yaolansvr192.168.0.316803写入
candicate master/monitor host yaolansvr_slave192.168.0.416804 读
slave yaolansvr_slave01192.168.0.516805 读
1.2、yaolansvr安装ftp服务,并上传mha安装软件
(1)关闭selinux,否则vsftpd报226错误
1.3、所有数据库节点安装Perl模块,同时做
1.4、在yaolansvr_slave安装mha manager
(1)No package perl-Log-Dispatch available.
或者# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
1.5、所有mysql服务器配置ssh登录无密码验证,sshd_config已改不生效,万能的重启
注意:
(1)ssh-copy-id: command not found,解决:yum install openssh-clients -y
(2)对自己本身也要执行ssh-copy-id
(3)关闭slave后重启mysql实例,slave相关的进程正常启动
1.6、搭建主从复制环境
yaolansvr服务器master节点-----------
注意:
(1)datadir和server-id的设置,candidate master和slave只修改server-id
(2)
1.6.1、在yaolansvr做备份,并创建复制用户
1.6.2、使用ftp从yaolansvr获取整库的备份,并恢复
解决:两台datadir/auto.cnf一样,select uuid()不同,所以删除candidate master上的auto.cnf,重新启动实例
重启mysql实例后:
1.6.3、其他的slave节点设置read_only=1(不写入my.cnf,以供candidate master转为主后,提供写)
# mysql -e "set global read_only=1"
1.6.4、注意******************所有数据库节点都要创建复制用户
(1)candidate master不创建复制用户,则报错:
(2)如果一个数据库节点不会成为备选master,且no_master=1,则不需要创建复制用户
只在candidate master创建复制用户,必须与master的复制用户相同
mysql> grant replication slave on *.* to 'repl1'@'192.168.0.%' identified by '123456';
mysql> flush privileges;
1.6.5、注意******************所有数据库节点都要创建监控用户,监控用户必须要
只在master上创建监控用户:但是会被复制
(1)其他节点必须创建监控用户,否则报错:
Mon Jun 29 18:02:41 2015 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln255] Got MySQL error when connecting 192.168.0.4(192.168.0.4:3306) :1045:Access denied for user 'monitor'@'192.168.0.4' (using password: YES), but this is not mysql crash. Check MySQL server settings.
mysql> grant all privileges on *.* to 'monitor'@'192.168.0.%' identified by '123456';
mysql> flush privileges;
1.7、配置MHA
(1)master_binlog_dir=/home/soft/mysql/3306/binlog必须是binlog所在目录
(2)如果一个数据库节点不会成为备选master,且no_master=1,则不需要创建复制用户
1.7.2、所有数据库节点 relay log的自动清除
# mysql -e "set global relay_log_purge=0"
1.7.3、所有数据库节点在环境变量中输出mysqlbinlog
# ln -sv /usr/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
1.7.4、检查ssh的配置
注意:
(1)# cp /yangsq/ftp/mha4mysql-manager-0.54/samples/scripts/master_ip_failover /etc/mha/app1/master_ip_failover或将master_ip_failover_script=/etc/mha/app1/master_ip_failover注释掉
否则报错:Mon Jun 29 17:05:26 2015 - [info] /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /etc/mha/app1/master_ip_failover line 93.
(2)没有mysqlbinlog,报错:
Mon Jun 29 17:52:11 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008
Mon Jun 29 17:52:11 2015 - [info] Connecting to root@192.168.0.3(192.168.0.3)..
Failed to save binary log: Binlog not found from /var/lib/mysql,/var/log/mysql! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf
# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf
Mon Jun 29 17:33:39 2015 - [info] Reading server configurations from /etc/mha/app1/app1.cnf..
Mon Jun 29 17:33:39 2015 - [info] MHA::MasterMonitor version 0.54.
Mon Jun 29 17:33:39 2015 - [info] Dead Servers:
Mon Jun 29 17:33:39 2015 - [info] Alive Servers:
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Mon Jun 29 17:33:39 2015 - [info] Alive Slaves:
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:33:39 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:33:39 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] Not candidate for the new Master (no_master is set)
Mon Jun 29 17:33:39 2015 - [info] Current Alive Master: 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] Checking slave configurations..
Mon Jun 29 17:33:39 2015 - [info] Checking replication filtering settings..
Mon Jun 29 17:33:39 2015 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jun 29 17:33:39 2015 - [info] Replication filtering check ok.
Mon Jun 29 17:33:39 2015 - [info] Starting SSH connection tests..
Mon Jun 29 17:34:21 2015 - [info] All SSH connection tests passed successfully.
Mon Jun 29 17:34:21 2015 - [info] Checking MHA Node version..
Mon Jun 29 17:34:31 2015 - [info] Version check ok.
Mon Jun 29 17:34:31 2015 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 29 17:34:32 2015 - [info] HealthCheck: SSH to 192.168.0.3 is reachable.
Mon Jun 29 17:34:32 2015 - [info] Master MHA Node version is 0.54.
Mon Jun 29 17:34:32 2015 - [info] Checking recovery script configurations on the current master..
Mon Jun 29 17:34:32 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008
Mon Jun 29 17:34:32 2015 - [info] Connecting to root@192.168.0.3(192.168.0.3)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/data, up to mysql-bin.000008
Mon Jun 29 17:34:32 2015 - [info] Master setting check done.
Mon Jun 29 17:34:32 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Jun 29 17:34:32 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.4 --slave_ip=192.168.0.4 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Mon Jun 29 17:34:32 2015 - [info] Connecting to root@192.168.0.4(192.168.0.4:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave-relay-bin.000003
Temporary relay log file is /data/mysql/data/yaolansvr_slave-relay-bin.000003
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Jun 29 17:34:33 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.5 --slave_ip=192.168.0.5 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Mon Jun 29 17:34:33 2015 - [info] Connecting to root@192.168.0.5(192.168.0.5:22)..
reverse mapping checking getaddrinfo for bogon [192.168.0.5] failed - POSSIBLE BREAK-IN ATTEMPT!
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave01-relay-bin.000002
Temporary relay log file is /data/mysql/data/yaolansvr_slave01-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Jun 29 17:34:43 2015 - [info] Slaves settings check done.
Mon Jun 29 17:34:43 2015 - [info]
192.168.0.3 (current master)
+--192.168.0.4
+--192.168.0.5
Mon Jun 29 17:34:43 2015 - [info] Checking replication health on 192.168.0.4..
Mon Jun 29 17:34:43 2015 - [info] ok.
Mon Jun 29 17:34:43 2015 - [info] Checking replication health on 192.168.0.5..
Mon Jun 29 17:34:43 2015 - [info] ok.
Mon Jun 29 17:34:43 2015 - [warning] master_ip_failover_script is not defined.
Mon Jun 29 17:34:43 2015 - [warning] shutdown_script is not defined.
Mon Jun 29 17:34:43 2015 - [info] Got exit code 0 (Not master dead).
1.8、vip配置
为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成
1.8.1、修改故障切换的脚本
# vi /etc/mha/app1/master_ip_failover
####添加变量
my $vip = '192.168.0.10/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";
####首先主节点需要启动vip
# /sbin/ifconfig eth0:1 192.168.0.10/24
# /sbin/ifconfig eth0:1 down
####然后检查复制环境状况
# masterha_check_repl --conf=/etc/mha/app1/app1.cnf
# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 is stopped(2:NOT_RUNNING).
# touch /etc/mha/app1/manager.log
# nohup masterha_manager --conf=/etc/mha/app1/app1.cnf /etc/mha/app1/manager.log 2>&1 &
# tail -f /var/log/manager.log
Tue Jun 30 10:21:25 2015 - [info] Got terminate signal. Exit.
Tue Jun 30 10:21:58 2015 - [info] MHA::MasterMonitor version 0.54.
Tue Jun 30 10:21:58 2015 - [info] Dead Servers:
Tue Jun 30 10:21:58 2015 - [info] Alive Servers:
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Tue Jun 30 10:21:58 2015 - [info] Alive Slaves:
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:21:58 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:21:58 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:21:58 2015 - [info] Current Alive Master: 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] Checking slave configurations..
Tue Jun 30 10:21:58 2015 - [info] read_only=1 is not set on slave 192.168.0.4(192.168.0.4:3306).
Tue Jun 30 10:21:58 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.4(192.168.0.4:3306).
Tue Jun 30 10:21:58 2015 - [info] read_only=1 is not set on slave 192.168.0.5(192.168.0.5:3306).
Tue Jun 30 10:21:58 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.5(192.168.0.5:3306).
Tue Jun 30 10:21:58 2015 - [info] Checking replication filtering settings..
Tue Jun 30 10:21:58 2015 - [info] binlog_do_db= , binlog_ignore_db=
Tue Jun 30 10:21:58 2015 - [info] Replication filtering check ok.
Tue Jun 30 10:21:58 2015 - [info] Starting SSH connection tests..
Tue Jun 30 10:22:00 2015 - [info] All SSH connection tests passed successfully.
Tue Jun 30 10:22:00 2015 - [info] Checking MHA Node version..
Tue Jun 30 10:22:01 2015 - [info] Version check ok.
Tue Jun 30 10:22:01 2015 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 30 10:22:01 2015 - [info] HealthCheck: SSH to 192.168.0.3 is reachable.
Tue Jun 30 10:22:01 2015 - [info] Master MHA Node version is 0.54.
Tue Jun 30 10:22:01 2015 - [info] Checking recovery script configurations on the current master..
Tue Jun 30 10:22:01 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000009
Tue Jun 30 10:22:01 2015 - [info] Connecting to root@192.168.0.3(192.168.0.3)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/data/, up to mysql-bin.000009
Tue Jun 30 10:22:01 2015 - [info] Master setting check done.
Tue Jun 30 10:22:01 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jun 30 10:22:01 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.4 --slave_ip=192.168.0.4 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Tue Jun 30 10:22:01 2015 - [info] Connecting to root@192.168.0.4(192.168.0.4:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave-relay-bin.000006
Temporary relay log file is /data/mysql/data/yaolansvr_slave-relay-bin.000006
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Jun 30 10:22:02 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.5 --slave_ip=192.168.0.5 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Tue Jun 30 10:22:02 2015 - [info] Connecting to root@192.168.0.5(192.168.0.5:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave01-relay-bin.000005
Temporary relay log file is /data/mysql/data/yaolansvr_slave01-relay-bin.000005
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Jun 30 10:22:02 2015 - [info] Slaves settings check done.
Tue Jun 30 10:22:02 2015 - [info]
192.168.0.3 (current master)
+--192.168.0.4
+--192.168.0.5
Tue Jun 30 10:22:02 2015 - [info] Checking master_ip_failover_script status:
Tue Jun 30 10:22:02 2015 - [info] /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306
Tue Jun 30 10:22:02 2015 - [info] OK.
Tue Jun 30 10:22:02 2015 - [warning] shutdown_script is not defined.
Tue Jun 30 10:22:02 2015 - [info] Set master ping interval 3 seconds.
Tue Jun 30 10:22:02 2015 - [info] Set secondary check script: masterha_secondary_check -s 192.168.0.3 -s 192.168.0.5
Tue Jun 30 10:22:02 2015 - [info] Starting ping health check on 192.168.0.3(192.168.0.3:3306)..
Tue Jun 30 10:22:02 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 (pid:2243) is running(0:PING_OK), master:192.168.0.3
# masterha_stop --conf=/etc/mha/app1/app1.cnf
进行切换后:
# tail -f /var/log/manager.log
Tue Jun 30 10:28:02 2015 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jun 30 10:28:02 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --binlog_prefix=mysql-bin
Tue Jun 30 10:28:03 2015 - [info] Executing seconary network check script: masterha_secondary_check -s 192.168.0.3 -s 192.168.0.5 --user=root --master_host=192.168.0.3 --master_ip=192.168.0.3 --master_port=3306
Tue Jun 30 10:28:03 2015 - [info] HealthCheck: SSH to 192.168.0.3 is reachable.
Monitoring server 192.168.0.3 is reachable, Master is not reachable from 192.168.0.3. OK.
Monitoring server 192.168.0.5 is reachable, Master is not reachable from 192.168.0.5. OK.
Tue Jun 30 10:28:03 2015 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Jun 30 10:28:05 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:05 2015 - [warning] Connection failed 1 time(s)..
Tue Jun 30 10:28:08 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:08 2015 - [warning] Connection failed 2 time(s)..
Tue Jun 30 10:28:11 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:11 2015 - [warning] Connection failed 3 time(s)..
Tue Jun 30 10:28:11 2015 - [warning] Master is not reachable from health checker!
Tue Jun 30 10:28:11 2015 - [warning] Master 192.168.0.3(192.168.0.3:3306) is not reachable!
Tue Jun 30 10:28:11 2015 - [warning] SSH is reachable.
Tue Jun 30 10:28:11 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 30 10:28:11 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 30 10:28:11 2015 - [info] Reading application default configurations from /etc/mha/app1/app1.cnf..
Tue Jun 30 10:28:11 2015 - [info] Reading server configurations from /etc/mha/app1/app1.cnf..
Tue Jun 30 10:28:12 2015 - [info] Dead Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Alive Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Tue Jun 30 10:28:12 2015 - [info] Alive Slaves:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:12 2015 - [info] Checking slave configurations..
Tue Jun 30 10:28:12 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.4(192.168.0.4:3306).
Tue Jun 30 10:28:12 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.5(192.168.0.5:3306).
Tue Jun 30 10:28:12 2015 - [info] Checking replication filtering settings..
Tue Jun 30 10:28:12 2015 - [info] Replication filtering check ok.
Tue Jun 30 10:28:12 2015 - [info] Master is down!
Tue Jun 30 10:28:12 2015 - [info] Terminating monitoring script.
Tue Jun 30 10:28:12 2015 - [info] Got exit code 20 (Master dead).
Tue Jun 30 10:28:12 2015 - [info] MHA::MasterFailover version 0.54.
Tue Jun 30 10:28:12 2015 - [info] Starting master failover.
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Dead Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Checking master reachability via mysql(double check)..
Tue Jun 30 10:28:12 2015 - [info] ok.
Tue Jun 30 10:28:12 2015 - [info] Alive Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Tue Jun 30 10:28:12 2015 - [info] Alive Slaves:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:12 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jun 30 10:28:12 2015 - [info] Executing master IP deactivatation script:
Tue Jun 30 10:28:12 2015 - [info] /etc/mha/app1/master_ip_failover --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306 --command=stopssh --ssh_user=root
Disabling the VIP on old master: 192.168.0.3
Tue Jun 30 10:28:13 2015 - [info] done.
Tue Jun 30 10:28:13 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jun 30 10:28:13 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] * Phase 3: Master Recovery Phase..
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:120
Tue Jun 30 10:28:13 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:13 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:120
Tue Jun 30 10:28:13 2015 - [info] Oldest slaves:
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] Fetching dead master's binary logs..
Tue Jun 30 10:28:13 2015 - [info] Executing command on the dead master 192.168.0.3(192.168.0.3:3306): save_binary_logs --command=save --start_file=mysql-bin.000009 --start_pos=120 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000009 pos 120 to mysql-bin.000009 EOF into /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog ..
Dumping binlog format description event, from position 0 to 120.. ok.
Dumping effective binlog data from /data/mysql/data//mysql-bin.000009 position 120 to tail(143).. ok.
Concat succeeded.
Tue Jun 30 10:28:14 2015 - [info] scp from root@192.168.0.3:/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:14 2015 - [info] HealthCheck: SSH to 192.168.0.4 is reachable.
Tue Jun 30 10:28:15 2015 - [info] HealthCheck: SSH to 192.168.0.5 is reachable.
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jun 30 10:28:15 2015 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Jun 30 10:28:15 2015 - [info] Searching new master from slaves..
Tue Jun 30 10:28:15 2015 - [info] Candidate masters from the configuration file:
Tue Jun 30 10:28:15 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:15 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:15 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:15 2015 - [info] Non-candidate masters:
Tue Jun 30 10:28:15 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:15 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:15 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:15 2015 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Jun 30 10:28:15 2015 - [info] New master is 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:28:15 2015 - [info] Starting master failover..
Tue Jun 30 10:28:15 2015 - [info]
From:
192.168.0.3 (current master)
+--192.168.0.4
+--192.168.0.5
To:
192.168.0.4 (new master)
+--192.168.0.5
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 30 10:28:15 2015 - [info] Sending binlog..
Tue Jun 30 10:28:15 2015 - [info] scp from local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to root@192.168.0.4:/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jun 30 10:28:15 2015 - [info] Starting recovery on 192.168.0.4(192.168.0.4:3306)..
Tue Jun 30 10:28:15 2015 - [info] Generating diffs succeeded.
Tue Jun 30 10:28:15 2015 - [info] Waiting until all relay logs are applied.
Tue Jun 30 10:28:15 2015 - [info] done.
Tue Jun 30 10:28:16 2015 - [info] Getting slave status..
Tue Jun 30 10:28:16 2015 - [info] This slave(192.168.0.4)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:120). No need to recover from Exec_Master_Log_Pos.
Tue Jun 30 10:28:16 2015 - [info] Connecting to the target slave host 192.168.0.4, running recover script..
Tue Jun 30 10:28:16 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.0.4 --slave_ip=192.168.0.4 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --workdir=/var/tmp --target_version=5.6.20-log --timestamp=20150630102812 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 --slave_pass=xxx
Tue Jun 30 10:28:16 2015 - [info]
MySQL client version is 5.6.20. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog on 192.168.0.4:3306. This may take long time...
Applying log files succeeded.
Tue Jun 30 10:28:16 2015 - [info] All relay logs were successfully applied.
Tue Jun 30 10:28:16 2015 - [info] Getting new master's binlog name and position..
Tue Jun 30 10:28:16 2015 - [info] mysql-bin.000015:120
Tue Jun 30 10:28:16 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.4', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000015', MASTER_LOG_POS=120, MASTER_USER='repl1', MASTER_PASSWORD='xxx';
Tue Jun 30 10:28:16 2015 - [info] Executing master IP activate script:
Tue Jun 30 10:28:16 2015 - [info] /etc/mha/app1/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306 --new_master_host=192.168.0.4 --new_master_ip=192.168.0.4 --new_master_port=3306 --new_master_user='monitor' --new_master_password='123456'
Enabling the VIP - 192.168.0.10/24 on the new master - 192.168.0.4
Tue Jun 30 10:28:16 2015 - [info] OK.
Tue Jun 30 10:28:16 2015 - [info] Setting read_only=0 on 192.168.0.4(192.168.0.4:3306)..
Tue Jun 30 10:28:16 2015 - [info] ok.
Tue Jun 30 10:28:16 2015 - [info] ** Finished master recovery successfully.
Tue Jun 30 10:28:16 2015 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] -- Slave diff file generation on host 192.168.0.5(192.168.0.5:3306) started, pid: 2658. Check tmp log /etc/mha/app1/192.168.0.5_3306_20150630102812.log if it takes time..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] Log messages from 192.168.0.5 ...
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 30 10:28:16 2015 - [info] End of log messages from 192.168.0.5.
Tue Jun 30 10:28:16 2015 - [info] -- 192.168.0.5(192.168.0.5:3306) has the latest relay log events.
Tue Jun 30 10:28:16 2015 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] -- Slave recovery on host 192.168.0.5(192.168.0.5:3306) started, pid: 2660. Check tmp log /etc/mha/app1/192.168.0.5_3306_20150630102812.log if it takes time..
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:18 2015 - [info] Log messages from 192.168.0.5 ...
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] Sending binlog..
Tue Jun 30 10:28:17 2015 - [info] scp from local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to root@192.168.0.5:/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:17 2015 - [info] Starting recovery on 192.168.0.5(192.168.0.5:3306)..
Tue Jun 30 10:28:17 2015 - [info] Generating diffs succeeded.
Tue Jun 30 10:28:17 2015 - [info] Waiting until all relay logs are applied.
Tue Jun 30 10:28:17 2015 - [info] done.
Tue Jun 30 10:28:17 2015 - [info] Getting slave status..
Tue Jun 30 10:28:17 2015 - [info] This slave(192.168.0.5)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:120). No need to recover from Exec_Master_Log_Pos.
Tue Jun 30 10:28:17 2015 - [info] Connecting to the target slave host 192.168.0.5, running recover script..
Tue Jun 30 10:28:17 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.0.5 --slave_ip=192.168.0.5 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --workdir=/var/tmp --target_version=5.6.20-log --timestamp=20150630102812 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 --slave_pass=xxx
Tue Jun 30 10:28:17 2015 - [info]
MySQL client version is 5.6.20. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog on 192.168.0.5:3306. This may take long time...
Applying log files succeeded.
Tue Jun 30 10:28:17 2015 - [info] All relay logs were successfully applied.
Tue Jun 30 10:28:17 2015 - [info] Resetting slave 192.168.0.5(192.168.0.5:3306) and starting replication from the new master 192.168.0.4(192.168.0.4:3306)..
Tue Jun 30 10:28:18 2015 - [info] Executed CHANGE MASTER.
Tue Jun 30 10:28:18 2015 - [info] Slave started.
Tue Jun 30 10:28:18 2015 - [info] End of log messages from 192.168.0.5.
Tue Jun 30 10:28:18 2015 - [info] -- Slave recovery on host 192.168.0.5(192.168.0.5:3306) succeeded.
Tue Jun 30 10:28:18 2015 - [info] All new slave servers recovered successfully.
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:18 2015 - [info] * Phase 5: New master cleanup phase..
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:18 2015 - [info] Resetting slave info on the new master..
Tue Jun 30 10:28:18 2015 - [info] 192.168.0.4: Resetting slave info succeeded.
Tue Jun 30 10:28:18 2015 - [info] Master failover to 192.168.0.4(192.168.0.4:3306) completed successfully.
Tue Jun 30 10:28:18 2015 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.0.3 to 192.168.0.4 succeeded
Master 192.168.0.3 is down!
Check MHA Manager logs at yaolansvr_slave:/var/log/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.0.3.
The latest slave 192.168.0.4(192.168.0.4:3306) has all relay logs for recovery.
Selected 192.168.0.4 as a new master.
192.168.0.4: OK: Applying all logs succeeded.
192.168.0.4: OK: Activated master IP address.
192.168.0.5: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.0.5: OK: Applying all logs succeeded. Slave started, replicating from 192.168.0.4.
192.168.0.4: Resetting slave info succeeded.
Master failover to 192.168.0.4(192.168.0.4:3306) completed successfully.
######192.168.0.4提升为主之后,查看状态:
# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 is stopped(2:NOT_RUNNING).
mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
| 0 |
+-------------+
######修复宕机的机器
首先cat /var/log/manager.log|grep "All other slaves should start"确定change master命令,把宕掉的数据库给启动,登陆进去后,slave status为空,使用change master命令设置应用的主节点,启动slave进程
然后设置read_only=1,最后检查复制环境,并启动mha manager的监控,并把# mysql -e "set global relay_log_purge=0"
192.168.0.4关闭mysql后,192.168.0.3提升为主的过程中报错:
Tue Jun 30 11:50:37 2015 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln297] Last failover was done at 2015/06/30 10:05:18. Current time is too early to do failover again. If you want to do failover, manually remove /etc/mha/app1/app1.failover.complete and run this script again.
Tue Jun 30 11:50:37 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_manager line 65
并且masterha_manager会立即死掉
注意:
(1)一旦重启slave,记得需要将mysql -e "set global read_only=1"
MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
注意:
(1)为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.5的半同步复制
1.1、搭建环境
用途 主机名 ip server_id 类型
master yaolansvr192.168.0.316803写入
candicate master/monitor host yaolansvr_slave192.168.0.416804 读
slave yaolansvr_slave01192.168.0.516805 读
1.2、yaolansvr安装ftp服务,并上传mha安装软件
(1)关闭selinux,否则vsftpd报226错误
# mkdir -p /yangsq/ftp # useradd -d /yangsq/ftp -s /sbin/nologin uftp # passwd uftp # chown -R uftp:uftp /yangsq/ftp # yum list all|grep vsftpd # yum -y install vsftpd.x86_64 # cp /etc/vsftpd/vsftpd.conf /etc/vsftpd/vsftpd.conf.bak anonymous_enable=YES改为anonymous_enable=NO local_enable=YES write_enable=YES chroot_local_user=YES # chkconfig vsftpd on # service vsftpd start # yum install ftp.x86_64 -y # sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: enforcing Mode from config file: enforcing Policy version: 24 Policy from config file: targeted # setenforce 0 # ftp 192.168.0.3 21
1.3、所有数据库节点安装Perl模块,同时做
yum -y install perl-DBD-MySQL yum -y install perl-CPAN.x86_64 cd /yangsq/ftp tar xvf mha4mysql-node-0.54.tar.gz cd mha4mysql-node-0.54 perl Makefile.PL make && make install Installing /usr/local/share/perl5/MHA/BinlogPosFinderXid.pm Installing /usr/local/share/perl5/MHA/SlaveUtil.pm Installing /usr/local/share/perl5/MHA/NodeUtil.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinderElp.pm Installing /usr/local/share/perl5/MHA/BinlogPosFindManager.pm Installing /usr/local/share/perl5/MHA/BinlogPosFinder.pm Installing /usr/local/share/perl5/MHA/BinlogManager.pm Installing /usr/local/share/perl5/MHA/NodeConst.pm Installing /usr/local/share/perl5/MHA/BinlogHeaderParser.pm Installing /usr/local/share/man/man1/filter_mysqlbinlog.1 Installing /usr/local/share/man/man1/purge_relay_logs.1 Installing /usr/local/share/man/man1/apply_diff_relay_logs.1 Installing /usr/local/share/man/man1/save_binary_logs.1 Installing /usr/local/bin/filter_mysqlbinlog Installing /usr/local/bin/apply_diff_relay_logs Installing /usr/local/bin/save_binary_logs Installing /usr/local/bin/purge_relay_logs
1.4、在yaolansvr_slave安装mha manager
(1)No package perl-Log-Dispatch available.
# mv CentOS-Base.repo CentOS-Base.repo.bak sftp> put C:\Users\Yaolan\Downloads\CentOS6-Base-163.repo # yum clean all # yum makecache
或者# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum -y install perl-Config-Tiny yum install perl-Log-Dispatch -y yum install perl-Parallel-ForkManager -y yum install perl-Time-HiRes -y # tar xvf mha4mysql-manager-0.54.tar.gz # cd mha4mysql-manager-0.54 # perl Makefile.PL # make && make install Installing /usr/local/share/perl5/MHA/ManagerAdminWrapper.pm Installing /usr/local/share/perl5/MHA/ManagerUtil.pm Installing /usr/local/share/perl5/MHA/MasterFailover.pm Installing /usr/local/share/perl5/MHA/MasterMonitor.pm Installing /usr/local/share/perl5/MHA/ManagerAdmin.pm Installing /usr/local/share/perl5/MHA/Config.pm Installing /usr/local/share/perl5/MHA/DBHelper.pm Installing /usr/local/share/perl5/MHA/HealthCheck.pm Installing /usr/local/share/perl5/MHA/FileStatus.pm Installing /usr/local/share/perl5/MHA/MasterRotate.pm Installing /usr/local/share/perl5/MHA/Server.pm Installing /usr/local/share/perl5/MHA/ServerManager.pm Installing /usr/local/share/perl5/MHA/SSHCheck.pm Installing /usr/local/share/perl5/MHA/ManagerConst.pm Installing /usr/local/share/man/man1/masterha_check_ssh.1 Installing /usr/local/share/man/man1/masterha_secondary_check.1 Installing /usr/local/share/man/man1/masterha_conf_host.1 Installing /usr/local/share/man/man1/masterha_check_status.1 Installing /usr/local/share/man/man1/masterha_stop.1 Installing /usr/local/share/man/man1/masterha_manager.1 Installing /usr/local/share/man/man1/masterha_master_monitor.1 Installing /usr/local/share/man/man1/masterha_check_repl.1 Installing /usr/local/share/man/man1/masterha_master_switch.1 Installing /usr/local/bin/masterha_manager Installing /usr/local/bin/masterha_check_ssh Installing /usr/local/bin/masterha_check_status Installing /usr/local/bin/masterha_master_monitor Installing /usr/local/bin/masterha_secondary_check Installing /usr/local/bin/masterha_conf_host Installing /usr/local/bin/masterha_check_repl Installing /usr/local/bin/masterha_stop Installing /usr/local/bin/masterha_master_switch
1.5、所有mysql服务器配置ssh登录无密码验证,sshd_config已改不生效,万能的重启
注意:
(1)ssh-copy-id: command not found,解决:yum install openssh-clients -y
(2)对自己本身也要执行ssh-copy-id
(3)关闭slave后重启mysql实例,slave相关的进程正常启动
# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa) # yum install -y openssh-clients # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.0.3 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.0.4 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.0.5
1.6、搭建主从复制环境
yaolansvr服务器master节点-----------
注意:
(1)datadir和server-id的设置,candidate master和slave只修改server-id
(2)
# vi /usr/mysql/etc/my.cnf [mysqld] port = 3306 datadir=/data/mysql/data #慢查询设置 slow-query-log-file=/var/log/MysqlQuery.log long_query_time =5 slow_query_log=1 #server-id server-id=16803 #binlog 设置 log-bin = /data/mysql/data/mysql-bin.log binlog_cache_size = 8M binlog_format=mixed #全局 join_buffer_size = 2M sort_buffer_size = 2M read_rnd_buffer_size = 2M read_buffer_size = 2M max_heap_table_size = 64M thread_cache_size=12 thread_concurrency = 12 query_cache_type = 1 query_cache_size = 32M ft_min_word_len = 4 thread_stack = 192K tmp_table_size = 64M #myisam内存设置 key_buffer_size=1024M #允许最大的复制传输 max_allowed_packet=64M #跳过dns解析 skip-name-resolve #连接数设置 max_connections = 1000 max_connect_errors = 200 #innodb设置 innodb_buffer_pool_size = 1G innodb_additional_mem_pool_size = 16M innodb_log_buffer_size = 8M innodb_log_file_size = 512M innodb_log_files_in_group = 3 innodb_file_per_table=1 innodb_stats_persistent_sample_pages=1000 innodb_write_io_threads = 8 innodb_read_io_threads = 8 innodb_thread_concurrency = 16 innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout = 30 sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES # service mysqld start
1.6.1、在yaolansvr做备份,并创建复制用户
# mysqldump -A --flush-privileges --lock-all-tables --events --routines --triggers --master-data=2>/yangsq/ftp/`date +%Y-%m-%d`_all.sql mysql> grant replication slave on *.* to 'repl1'@'192.168.0.%' identified by '123456'; mysql> flush privileges;
1.6.2、使用ftp从yaolansvr获取整库的备份,并恢复
# mysql show slave status; Empty set (0.00 sec) mysql> change master to master_host='192.168.0.3',master_user='repl1',master_password='123456',master_port=3306,master_log_file='mysql-bin.000008',master_log_pos=120; mysql> start slave; Last_IO_Errno: 1593 Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server UUIDs; these UUIDs must be different for replication to work.
解决:两台datadir/auto.cnf一样,select uuid()不同,所以删除candidate master上的auto.cnf,重新启动实例
重启mysql实例后:
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.3 Master_User: repl1 Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000008 Read_Master_Log_Pos: 408 Relay_Log_File: yaolansvr_slave-relay-bin.000003 Relay_Log_Pos: 571 Relay_Master_Log_File: mysql-bin.000008 Slave_IO_Running: Yes Slave_SQL_Running: Yes
1.6.3、其他的slave节点设置read_only=1(不写入my.cnf,以供candidate master转为主后,提供写)
# mysql -e "set global read_only=1"
1.6.4、注意******************所有数据库节点都要创建复制用户
(1)candidate master不创建复制用户,则报错:
Mon Jun 29 17:28:00 2015 - [info] Alive Slaves: Mon Jun 29 17:28:00 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled Mon Jun 29 17:28:00 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306) Mon Jun 29 17:28:00 2015 - [info] Primary candidate for the new Master (candidate_master is set) Mon Jun 29 17:28:00 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled Mon Jun 29 17:28:00 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306) Mon Jun 29 17:28:00 2015 - [info] Not candidate for the new Master (no_master is set) Mon Jun 29 17:28:00 2015 - [info] Current Alive Master: 192.168.0.3(192.168.0.3:3306) Mon Jun 29 17:28:00 2015 - [info] Checking slave configurations.. Mon Jun 29 17:28:00 2015 - [info] Checking replication filtering settings.. Mon Jun 29 17:28:00 2015 - [info] binlog_do_db= , binlog_ignore_db= Mon Jun 29 17:28:00 2015 - [info] Replication filtering check ok. Mon Jun 29 17:28:00 2015 - [error][/usr/local/share/perl5/MHA/Server.pm, ln382] 192.168.0.4(192.168.0.4:3306): User repl1 does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
(2)如果一个数据库节点不会成为备选master,且no_master=1,则不需要创建复制用户
只在candidate master创建复制用户,必须与master的复制用户相同
mysql> grant replication slave on *.* to 'repl1'@'192.168.0.%' identified by '123456';
mysql> flush privileges;
1.6.5、注意******************所有数据库节点都要创建监控用户,监控用户必须要
只在master上创建监控用户:但是会被复制
(1)其他节点必须创建监控用户,否则报错:
Mon Jun 29 18:02:41 2015 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln255] Got MySQL error when connecting 192.168.0.4(192.168.0.4:3306) :1045:Access denied for user 'monitor'@'192.168.0.4' (using password: YES), but this is not mysql crash. Check MySQL server settings.
mysql> grant all privileges on *.* to 'monitor'@'192.168.0.%' identified by '123456';
mysql> flush privileges;
1.7、配置MHA
(1)master_binlog_dir=/home/soft/mysql/3306/binlog必须是binlog所在目录
(2)如果一个数据库节点不会成为备选master,且no_master=1,则不需要创建复制用户
# mkdir -p /etc/mha/app1 # vi /etc/mha/app1/app1.cnf [server default] manager_workdir=/etc/mha/app1 manager_log=/var/log/manager.log master_binlog_dir=/data/mysql/data/ ssh_user=root user=monitor password=123456 repl_user=repl1 repl_password=123456 secondary_check_script=masterha_secondary_check -s 192.168.0.3 -s 192.168.0.5 ping_interval=3 #master_ip_failover_script=/etc/mha/app1/master_ip_failover #shutdown_script=/script/masterha/power_manager #report_script=/script/masterha/send_report #master_ip_online_change_script=/etc/mha/master_ip_failover [server1] hostname=192.168.0.3 port=3306 #master_binlog_dir=/data/mysql/data candidate_master=1 [server2] hostname=192.168.0.4 port=3306 #master_binlog_dir=/data/mysql/data candidate_master=1 [server3] hostname=192.168.0.5 port=3306 no_master=1
1.7.2、所有数据库节点 relay log的自动清除
# mysql -e "set global relay_log_purge=0"
1.7.3、所有数据库节点在环境变量中输出mysqlbinlog
# ln -sv /usr/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
1.7.4、检查ssh的配置
注意:
(1)# cp /yangsq/ftp/mha4mysql-manager-0.54/samples/scripts/master_ip_failover /etc/mha/app1/master_ip_failover或将master_ip_failover_script=/etc/mha/app1/master_ip_failover注释掉
否则报错:Mon Jun 29 17:05:26 2015 - [info] /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /etc/mha/app1/master_ip_failover line 93.
(2)没有mysqlbinlog,报错:
Mon Jun 29 17:52:11 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008
Mon Jun 29 17:52:11 2015 - [info] Connecting to root@192.168.0.3(192.168.0.3)..
Failed to save binary log: Binlog not found from /var/lib/mysql,/var/log/mysql! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again.
# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf
# masterha_check_ssh --conf=/etc/mha/app1/app1.cnf
Mon Jun 29 17:33:39 2015 - [info] Reading server configurations from /etc/mha/app1/app1.cnf..
Mon Jun 29 17:33:39 2015 - [info] MHA::MasterMonitor version 0.54.
Mon Jun 29 17:33:39 2015 - [info] Dead Servers:
Mon Jun 29 17:33:39 2015 - [info] Alive Servers:
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Mon Jun 29 17:33:39 2015 - [info] Alive Slaves:
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:33:39 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jun 29 17:33:39 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Mon Jun 29 17:33:39 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] Not candidate for the new Master (no_master is set)
Mon Jun 29 17:33:39 2015 - [info] Current Alive Master: 192.168.0.3(192.168.0.3:3306)
Mon Jun 29 17:33:39 2015 - [info] Checking slave configurations..
Mon Jun 29 17:33:39 2015 - [info] Checking replication filtering settings..
Mon Jun 29 17:33:39 2015 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jun 29 17:33:39 2015 - [info] Replication filtering check ok.
Mon Jun 29 17:33:39 2015 - [info] Starting SSH connection tests..
Mon Jun 29 17:34:21 2015 - [info] All SSH connection tests passed successfully.
Mon Jun 29 17:34:21 2015 - [info] Checking MHA Node version..
Mon Jun 29 17:34:31 2015 - [info] Version check ok.
Mon Jun 29 17:34:31 2015 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jun 29 17:34:32 2015 - [info] HealthCheck: SSH to 192.168.0.3 is reachable.
Mon Jun 29 17:34:32 2015 - [info] Master MHA Node version is 0.54.
Mon Jun 29 17:34:32 2015 - [info] Checking recovery script configurations on the current master..
Mon Jun 29 17:34:32 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000008
Mon Jun 29 17:34:32 2015 - [info] Connecting to root@192.168.0.3(192.168.0.3)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/data, up to mysql-bin.000008
Mon Jun 29 17:34:32 2015 - [info] Master setting check done.
Mon Jun 29 17:34:32 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Jun 29 17:34:32 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.4 --slave_ip=192.168.0.4 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Mon Jun 29 17:34:32 2015 - [info] Connecting to root@192.168.0.4(192.168.0.4:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave-relay-bin.000003
Temporary relay log file is /data/mysql/data/yaolansvr_slave-relay-bin.000003
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Jun 29 17:34:33 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.5 --slave_ip=192.168.0.5 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Mon Jun 29 17:34:33 2015 - [info] Connecting to root@192.168.0.5(192.168.0.5:22)..
reverse mapping checking getaddrinfo for bogon [192.168.0.5] failed - POSSIBLE BREAK-IN ATTEMPT!
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave01-relay-bin.000002
Temporary relay log file is /data/mysql/data/yaolansvr_slave01-relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Jun 29 17:34:43 2015 - [info] Slaves settings check done.
Mon Jun 29 17:34:43 2015 - [info]
192.168.0.3 (current master)
+--192.168.0.4
+--192.168.0.5
Mon Jun 29 17:34:43 2015 - [info] Checking replication health on 192.168.0.4..
Mon Jun 29 17:34:43 2015 - [info] ok.
Mon Jun 29 17:34:43 2015 - [info] Checking replication health on 192.168.0.5..
Mon Jun 29 17:34:43 2015 - [info] ok.
Mon Jun 29 17:34:43 2015 - [warning] master_ip_failover_script is not defined.
Mon Jun 29 17:34:43 2015 - [warning] shutdown_script is not defined.
Mon Jun 29 17:34:43 2015 - [info] Got exit code 0 (Not master dead).
1.8、vip配置
为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成
1.8.1、修改故障切换的脚本
# vi /etc/mha/app1/master_ip_failover
####添加变量
my $vip = '192.168.0.10/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down";
####首先主节点需要启动vip
# /sbin/ifconfig eth0:1 192.168.0.10/24
# /sbin/ifconfig eth0:1 down
####然后检查复制环境状况
# masterha_check_repl --conf=/etc/mha/app1/app1.cnf
# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 is stopped(2:NOT_RUNNING).
# touch /etc/mha/app1/manager.log
# nohup masterha_manager --conf=/etc/mha/app1/app1.cnf /etc/mha/app1/manager.log 2>&1 &
# tail -f /var/log/manager.log
Tue Jun 30 10:21:25 2015 - [info] Got terminate signal. Exit.
Tue Jun 30 10:21:58 2015 - [info] MHA::MasterMonitor version 0.54.
Tue Jun 30 10:21:58 2015 - [info] Dead Servers:
Tue Jun 30 10:21:58 2015 - [info] Alive Servers:
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Tue Jun 30 10:21:58 2015 - [info] Alive Slaves:
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:21:58 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:21:58 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:21:58 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:21:58 2015 - [info] Current Alive Master: 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:21:58 2015 - [info] Checking slave configurations..
Tue Jun 30 10:21:58 2015 - [info] read_only=1 is not set on slave 192.168.0.4(192.168.0.4:3306).
Tue Jun 30 10:21:58 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.4(192.168.0.4:3306).
Tue Jun 30 10:21:58 2015 - [info] read_only=1 is not set on slave 192.168.0.5(192.168.0.5:3306).
Tue Jun 30 10:21:58 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.5(192.168.0.5:3306).
Tue Jun 30 10:21:58 2015 - [info] Checking replication filtering settings..
Tue Jun 30 10:21:58 2015 - [info] binlog_do_db= , binlog_ignore_db=
Tue Jun 30 10:21:58 2015 - [info] Replication filtering check ok.
Tue Jun 30 10:21:58 2015 - [info] Starting SSH connection tests..
Tue Jun 30 10:22:00 2015 - [info] All SSH connection tests passed successfully.
Tue Jun 30 10:22:00 2015 - [info] Checking MHA Node version..
Tue Jun 30 10:22:01 2015 - [info] Version check ok.
Tue Jun 30 10:22:01 2015 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jun 30 10:22:01 2015 - [info] HealthCheck: SSH to 192.168.0.3 is reachable.
Tue Jun 30 10:22:01 2015 - [info] Master MHA Node version is 0.54.
Tue Jun 30 10:22:01 2015 - [info] Checking recovery script configurations on the current master..
Tue Jun 30 10:22:01 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --start_file=mysql-bin.000009
Tue Jun 30 10:22:01 2015 - [info] Connecting to root@192.168.0.3(192.168.0.3)..
Creating /var/tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/data/, up to mysql-bin.000009
Tue Jun 30 10:22:01 2015 - [info] Master setting check done.
Tue Jun 30 10:22:01 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jun 30 10:22:01 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.4 --slave_ip=192.168.0.4 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Tue Jun 30 10:22:01 2015 - [info] Connecting to root@192.168.0.4(192.168.0.4:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave-relay-bin.000006
Temporary relay log file is /data/mysql/data/yaolansvr_slave-relay-bin.000006
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Jun 30 10:22:02 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='monitor' --slave_host=192.168.0.5 --slave_ip=192.168.0.5 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-log --manager_version=0.54 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx
Tue Jun 30 10:22:02 2015 - [info] Connecting to root@192.168.0.5(192.168.0.5:22)..
Checking slave recovery environment settings..
Opening /data/mysql/data/relay-log.info ... ok.
Relay log found at /data/mysql/data, up to yaolansvr_slave01-relay-bin.000005
Temporary relay log file is /data/mysql/data/yaolansvr_slave01-relay-bin.000005
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Jun 30 10:22:02 2015 - [info] Slaves settings check done.
Tue Jun 30 10:22:02 2015 - [info]
192.168.0.3 (current master)
+--192.168.0.4
+--192.168.0.5
Tue Jun 30 10:22:02 2015 - [info] Checking master_ip_failover_script status:
Tue Jun 30 10:22:02 2015 - [info] /etc/mha/app1/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306
Tue Jun 30 10:22:02 2015 - [info] OK.
Tue Jun 30 10:22:02 2015 - [warning] shutdown_script is not defined.
Tue Jun 30 10:22:02 2015 - [info] Set master ping interval 3 seconds.
Tue Jun 30 10:22:02 2015 - [info] Set secondary check script: masterha_secondary_check -s 192.168.0.3 -s 192.168.0.5
Tue Jun 30 10:22:02 2015 - [info] Starting ping health check on 192.168.0.3(192.168.0.3:3306)..
Tue Jun 30 10:22:02 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 (pid:2243) is running(0:PING_OK), master:192.168.0.3
# masterha_stop --conf=/etc/mha/app1/app1.cnf
进行切换后:
# tail -f /var/log/manager.log
Tue Jun 30 10:28:02 2015 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Jun 30 10:28:02 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/save_binary_logs_test --manager_version=0.54 --binlog_prefix=mysql-bin
Tue Jun 30 10:28:03 2015 - [info] Executing seconary network check script: masterha_secondary_check -s 192.168.0.3 -s 192.168.0.5 --user=root --master_host=192.168.0.3 --master_ip=192.168.0.3 --master_port=3306
Tue Jun 30 10:28:03 2015 - [info] HealthCheck: SSH to 192.168.0.3 is reachable.
Monitoring server 192.168.0.3 is reachable, Master is not reachable from 192.168.0.3. OK.
Monitoring server 192.168.0.5 is reachable, Master is not reachable from 192.168.0.5. OK.
Tue Jun 30 10:28:03 2015 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Jun 30 10:28:05 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:05 2015 - [warning] Connection failed 1 time(s)..
Tue Jun 30 10:28:08 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:08 2015 - [warning] Connection failed 2 time(s)..
Tue Jun 30 10:28:11 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Jun 30 10:28:11 2015 - [warning] Connection failed 3 time(s)..
Tue Jun 30 10:28:11 2015 - [warning] Master is not reachable from health checker!
Tue Jun 30 10:28:11 2015 - [warning] Master 192.168.0.3(192.168.0.3:3306) is not reachable!
Tue Jun 30 10:28:11 2015 - [warning] SSH is reachable.
Tue Jun 30 10:28:11 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1/app1.cnf again, and trying to connect to all servers to check server status..
Tue Jun 30 10:28:11 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jun 30 10:28:11 2015 - [info] Reading application default configurations from /etc/mha/app1/app1.cnf..
Tue Jun 30 10:28:11 2015 - [info] Reading server configurations from /etc/mha/app1/app1.cnf..
Tue Jun 30 10:28:12 2015 - [info] Dead Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Alive Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Tue Jun 30 10:28:12 2015 - [info] Alive Slaves:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:12 2015 - [info] Checking slave configurations..
Tue Jun 30 10:28:12 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.4(192.168.0.4:3306).
Tue Jun 30 10:28:12 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.0.5(192.168.0.5:3306).
Tue Jun 30 10:28:12 2015 - [info] Checking replication filtering settings..
Tue Jun 30 10:28:12 2015 - [info] Replication filtering check ok.
Tue Jun 30 10:28:12 2015 - [info] Master is down!
Tue Jun 30 10:28:12 2015 - [info] Terminating monitoring script.
Tue Jun 30 10:28:12 2015 - [info] Got exit code 20 (Master dead).
Tue Jun 30 10:28:12 2015 - [info] MHA::MasterFailover version 0.54.
Tue Jun 30 10:28:12 2015 - [info] Starting master failover.
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] * Phase 1: Configuration Check Phase..
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Dead Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Checking master reachability via mysql(double check)..
Tue Jun 30 10:28:12 2015 - [info] ok.
Tue Jun 30 10:28:12 2015 - [info] Alive Servers:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306)
Tue Jun 30 10:28:12 2015 - [info] Alive Slaves:
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:12 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:12 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:12 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:12 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jun 30 10:28:12 2015 - [info]
Tue Jun 30 10:28:12 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jun 30 10:28:12 2015 - [info] Executing master IP deactivatation script:
Tue Jun 30 10:28:12 2015 - [info] /etc/mha/app1/master_ip_failover --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306 --command=stopssh --ssh_user=root
Disabling the VIP on old master: 192.168.0.3
Tue Jun 30 10:28:13 2015 - [info] done.
Tue Jun 30 10:28:13 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jun 30 10:28:13 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] * Phase 3: Master Recovery Phase..
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000009:120
Tue Jun 30 10:28:13 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:13 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000009:120
Tue Jun 30 10:28:13 2015 - [info] Oldest slaves:
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:13 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:13 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:13 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Jun 30 10:28:13 2015 - [info]
Tue Jun 30 10:28:13 2015 - [info] Fetching dead master's binary logs..
Tue Jun 30 10:28:13 2015 - [info] Executing command on the dead master 192.168.0.3(192.168.0.3:3306): save_binary_logs --command=save --start_file=mysql-bin.000009 --start_pos=120 --binlog_dir=/data/mysql/data/ --output_file=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000009 pos 120 to mysql-bin.000009 EOF into /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog ..
Dumping binlog format description event, from position 0 to 120.. ok.
Dumping effective binlog data from /data/mysql/data//mysql-bin.000009 position 120 to tail(143).. ok.
Concat succeeded.
Tue Jun 30 10:28:14 2015 - [info] scp from root@192.168.0.3:/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:14 2015 - [info] HealthCheck: SSH to 192.168.0.4 is reachable.
Tue Jun 30 10:28:15 2015 - [info] HealthCheck: SSH to 192.168.0.5 is reachable.
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jun 30 10:28:15 2015 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Jun 30 10:28:15 2015 - [info] Searching new master from slaves..
Tue Jun 30 10:28:15 2015 - [info] Candidate masters from the configuration file:
Tue Jun 30 10:28:15 2015 - [info] 192.168.0.4(192.168.0.4:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:15 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:15 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jun 30 10:28:15 2015 - [info] Non-candidate masters:
Tue Jun 30 10:28:15 2015 - [info] 192.168.0.5(192.168.0.5:3306) Version=5.6.20-log (oldest major version between slaves) log-bin:enabled
Tue Jun 30 10:28:15 2015 - [info] Replicating from 192.168.0.3(192.168.0.3:3306)
Tue Jun 30 10:28:15 2015 - [info] Not candidate for the new Master (no_master is set)
Tue Jun 30 10:28:15 2015 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Jun 30 10:28:15 2015 - [info] New master is 192.168.0.4(192.168.0.4:3306)
Tue Jun 30 10:28:15 2015 - [info] Starting master failover..
Tue Jun 30 10:28:15 2015 - [info]
From:
192.168.0.3 (current master)
+--192.168.0.4
+--192.168.0.5
To:
192.168.0.4 (new master)
+--192.168.0.5
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 30 10:28:15 2015 - [info] Sending binlog..
Tue Jun 30 10:28:15 2015 - [info] scp from local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to root@192.168.0.4:/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Jun 30 10:28:15 2015 - [info]
Tue Jun 30 10:28:15 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jun 30 10:28:15 2015 - [info] Starting recovery on 192.168.0.4(192.168.0.4:3306)..
Tue Jun 30 10:28:15 2015 - [info] Generating diffs succeeded.
Tue Jun 30 10:28:15 2015 - [info] Waiting until all relay logs are applied.
Tue Jun 30 10:28:15 2015 - [info] done.
Tue Jun 30 10:28:16 2015 - [info] Getting slave status..
Tue Jun 30 10:28:16 2015 - [info] This slave(192.168.0.4)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:120). No need to recover from Exec_Master_Log_Pos.
Tue Jun 30 10:28:16 2015 - [info] Connecting to the target slave host 192.168.0.4, running recover script..
Tue Jun 30 10:28:16 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.0.4 --slave_ip=192.168.0.4 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --workdir=/var/tmp --target_version=5.6.20-log --timestamp=20150630102812 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 --slave_pass=xxx
Tue Jun 30 10:28:16 2015 - [info]
MySQL client version is 5.6.20. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog on 192.168.0.4:3306. This may take long time...
Applying log files succeeded.
Tue Jun 30 10:28:16 2015 - [info] All relay logs were successfully applied.
Tue Jun 30 10:28:16 2015 - [info] Getting new master's binlog name and position..
Tue Jun 30 10:28:16 2015 - [info] mysql-bin.000015:120
Tue Jun 30 10:28:16 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.4', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000015', MASTER_LOG_POS=120, MASTER_USER='repl1', MASTER_PASSWORD='xxx';
Tue Jun 30 10:28:16 2015 - [info] Executing master IP activate script:
Tue Jun 30 10:28:16 2015 - [info] /etc/mha/app1/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.0.3 --orig_master_ip=192.168.0.3 --orig_master_port=3306 --new_master_host=192.168.0.4 --new_master_ip=192.168.0.4 --new_master_port=3306 --new_master_user='monitor' --new_master_password='123456'
Enabling the VIP - 192.168.0.10/24 on the new master - 192.168.0.4
Tue Jun 30 10:28:16 2015 - [info] OK.
Tue Jun 30 10:28:16 2015 - [info] Setting read_only=0 on 192.168.0.4(192.168.0.4:3306)..
Tue Jun 30 10:28:16 2015 - [info] ok.
Tue Jun 30 10:28:16 2015 - [info] ** Finished master recovery successfully.
Tue Jun 30 10:28:16 2015 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] -- Slave diff file generation on host 192.168.0.5(192.168.0.5:3306) started, pid: 2658. Check tmp log /etc/mha/app1/192.168.0.5_3306_20150630102812.log if it takes time..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] Log messages from 192.168.0.5 ...
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jun 30 10:28:16 2015 - [info] End of log messages from 192.168.0.5.
Tue Jun 30 10:28:16 2015 - [info] -- 192.168.0.5(192.168.0.5:3306) has the latest relay log events.
Tue Jun 30 10:28:16 2015 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jun 30 10:28:16 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] -- Slave recovery on host 192.168.0.5(192.168.0.5:3306) started, pid: 2660. Check tmp log /etc/mha/app1/192.168.0.5_3306_20150630102812.log if it takes time..
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:18 2015 - [info] Log messages from 192.168.0.5 ...
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:16 2015 - [info] Sending binlog..
Tue Jun 30 10:28:17 2015 - [info] scp from local:/etc/mha/app1/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog to root@192.168.0.5:/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog succeeded.
Tue Jun 30 10:28:17 2015 - [info] Starting recovery on 192.168.0.5(192.168.0.5:3306)..
Tue Jun 30 10:28:17 2015 - [info] Generating diffs succeeded.
Tue Jun 30 10:28:17 2015 - [info] Waiting until all relay logs are applied.
Tue Jun 30 10:28:17 2015 - [info] done.
Tue Jun 30 10:28:17 2015 - [info] Getting slave status..
Tue Jun 30 10:28:17 2015 - [info] This slave(192.168.0.5)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000009:120). No need to recover from Exec_Master_Log_Pos.
Tue Jun 30 10:28:17 2015 - [info] Connecting to the target slave host 192.168.0.5, running recover script..
Tue Jun 30 10:28:17 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.0.5 --slave_ip=192.168.0.5 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog --workdir=/var/tmp --target_version=5.6.20-log --timestamp=20150630102812 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.54 --slave_pass=xxx
Tue Jun 30 10:28:17 2015 - [info]
MySQL client version is 5.6.20. Using --binary-mode.
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.0.3_3306_20150630102812.binlog on 192.168.0.5:3306. This may take long time...
Applying log files succeeded.
Tue Jun 30 10:28:17 2015 - [info] All relay logs were successfully applied.
Tue Jun 30 10:28:17 2015 - [info] Resetting slave 192.168.0.5(192.168.0.5:3306) and starting replication from the new master 192.168.0.4(192.168.0.4:3306)..
Tue Jun 30 10:28:18 2015 - [info] Executed CHANGE MASTER.
Tue Jun 30 10:28:18 2015 - [info] Slave started.
Tue Jun 30 10:28:18 2015 - [info] End of log messages from 192.168.0.5.
Tue Jun 30 10:28:18 2015 - [info] -- Slave recovery on host 192.168.0.5(192.168.0.5:3306) succeeded.
Tue Jun 30 10:28:18 2015 - [info] All new slave servers recovered successfully.
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:18 2015 - [info] * Phase 5: New master cleanup phase..
Tue Jun 30 10:28:18 2015 - [info]
Tue Jun 30 10:28:18 2015 - [info] Resetting slave info on the new master..
Tue Jun 30 10:28:18 2015 - [info] 192.168.0.4: Resetting slave info succeeded.
Tue Jun 30 10:28:18 2015 - [info] Master failover to 192.168.0.4(192.168.0.4:3306) completed successfully.
Tue Jun 30 10:28:18 2015 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.0.3 to 192.168.0.4 succeeded
Master 192.168.0.3 is down!
Check MHA Manager logs at yaolansvr_slave:/var/log/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.0.3.
The latest slave 192.168.0.4(192.168.0.4:3306) has all relay logs for recovery.
Selected 192.168.0.4 as a new master.
192.168.0.4: OK: Applying all logs succeeded.
192.168.0.4: OK: Activated master IP address.
192.168.0.5: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.0.5: OK: Applying all logs succeeded. Slave started, replicating from 192.168.0.4.
192.168.0.4: Resetting slave info succeeded.
Master failover to 192.168.0.4(192.168.0.4:3306) completed successfully.
######192.168.0.4提升为主之后,查看状态:
# masterha_check_status --conf=/etc/mha/app1/app1.cnf
app1 is stopped(2:NOT_RUNNING).
mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
| 0 |
+-------------+
######修复宕机的机器
首先cat /var/log/manager.log|grep "All other slaves should start"确定change master命令,把宕掉的数据库给启动,登陆进去后,slave status为空,使用change master命令设置应用的主节点,启动slave进程
然后设置read_only=1,最后检查复制环境,并启动mha manager的监控,并把# mysql -e "set global relay_log_purge=0"
192.168.0.4关闭mysql后,192.168.0.3提升为主的过程中报错:
Tue Jun 30 11:50:37 2015 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln297] Last failover was done at 2015/06/30 10:05:18. Current time is too early to do failover again. If you want to do failover, manually remove /etc/mha/app1/app1.failover.complete and run this script again.
Tue Jun 30 11:50:37 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_manager line 65
并且masterha_manager会立即死掉
注意:
(1)一旦重启slave,记得需要将mysql -e "set global read_only=1"