欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

MYSQL高可用架构之MHA的搭建

程序员文章站 2024-03-20 20:57:16
...

#############################相关概念##############################

MySQL MHA架构介绍:

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于 Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在 0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一*立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机(down机:指电脑不能正常工作)的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。

MHA集群架构图:

MYSQL高可用架构之MHA的搭建

MHA工作原理总结为以下几条:

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log) 到其他slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新master;

(6)使用其他的slave连接新的master进行复制。

MHA软件由两部分组成,Manager工具包和Node工具包,具体的说明如下。

Manager工具包主要包括以下几个工具:

masterha_check_ssh              检查MHA的SSH配置状况
masterha_check_repl             检查MySQL复制状况
masterha_manger                 启动MHA
masterha_check_status           检测当前MHA运行状态
masterha_master_monitor         检测master是否宕机
masterha_master_switch          控制故障转移(自动或者手动)
masterha_conf_host              添加或删除配置的server信息

Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:

save_binary_logs                保存和复制master的二进制日志
apply_diff_relay_logs           识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog              去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs                清除中继日志(不会阻塞SQL线程)

注意:

为了尽可能的减少主库硬件损坏宕机造成的数据丢失,因此在配置MHA的同时建议配置成MySQL 5.7的半同步复制。

参考文档:
MHA原理:https://code.google.com/p/mysql-master-ha/wiki/HowMHAWorks
MHA原理PPT:http://www.slideshare.net/matsunobu/automated-master-failover
Linux配置代理方法:http://blog.csdn.net/bojie5744/article/details/42148719

#############################MHA搭建###################################

配置环境:

版本:redhat6.5
master:server1(172.25.9.1)
Candicate slave:server2(172.25.9.2)
slave:server3(172.25.9.3)
manager:server4(172.25.9.4)

角色                    ip地址          主机名          server_id                  类型
Master                  172.25.9.1    server1            1                      写入
Candicate master        172.25.9.2    server2            2                      读
Slave                   172.25.9.3    server3            3                      读
Monitor host            172.25.9.4    server4            4                      监控复制组

其中master对外提供写服务,备选master(实际的slave,主机名server03)提供读服务,slave也提供相关的读服务,一旦master宕机,将会把备选master提升为新的master,slave指向新的master。

一.搭建主从复制(一主两从)

    点击查看详细过程(主从复制在这里不再赘述)

二.安装MHA软件

(1)三个节点(server1,server2,server3)安装

mha4mysql-node-0.56-0.el6.noarch.rpm

aaa@qq.com mha]# rpm -qa | grep mha4   ##安装的mha node软件
mha4mysql-node-0.56-0.el6.noarch
[aaa@qq.com mha]# rpm -qa | grep perl   ##需要的依赖包
perl-DBD-MySQL-4.013-3.el6.x86_64
perl-Pod-Escapes-1.04-136.el6.x86_64
perl-Pod-Simple-3.13-136.el6.x86_64
perl-Module-Pluggable-3.90-136.el6.x86_64
perl-DBI-1.609-4.el6.x86_64
perl-Error-0.17015-4.el6.noarch
perl-Git-1.7.1-3.el6_4.1.noarch
perl-libs-5.10.1-136.el6.x86_64
perl-version-0.77-136.el6.x86_64
perl-5.10.1-136.el6.x86_64

(2)管理节点安装manager

mha4mysql-manager-0.56-0.el6.noarch

MYSQL高可用架构之MHA的搭建

(3)免密配置

方法一(运用rsync(两边都要装此命令)命令):

aaa@qq.com ~]# ssh-****** 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
11:bd:98:f5:22:27:33:a0:df:46:85:de:cd:7d:1c:b4 aaa@qq.com
The key's randomart image is:
+--[ RSA 2048]----+
|        .o    .. |
|      . ..+    ..|
|     . o.* = . E.|
|    .   X.= + . o|
|     . oS* .   . |
|      . o        |
|       .         |
|                 |
|                 |
+-----------------+
[aaa@qq.com ~]# cd /root/.ssh/
[aaa@qq.com .ssh]# yum install -y rsync
[aaa@qq.com .ssh]# ssh-copy-id server4
The authenticity of host 'server4 (172.25.254.4)' can't be established.
RSA key fingerprint is 57:9d:a3:b0:00:cb:7e:c0:8a:a5:75:55:de:53:19:7b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'server4,172.25.254.4' (RSA) to the list of known hosts.
aaa@qq.com's password: 
Now try logging into the machine, with "ssh 'server4'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[aaa@qq.com .ssh]# rsync -p * server1:/root/.ssh/
aaa@qq.com's password: 
[aaa@qq.com .ssh]# rsync -p * server2:/root/.ssh/
aaa@qq.com's password: 
[aaa@qq.com .ssh]# rsync -p * server3:/root/.ssh/
aaa@qq.com's password: 
[aaa@qq.com .ssh]# ssh server1
[aaa@qq.com ~]# logout
Connection to server1 closed.
[aaa@qq.com .ssh]# ssh server2
Last login: Wed Jul 11 23:34:42 2018 from server1
[aaa@qq.com ~]# logout
Connection to server2 closed.
[aaa@qq.com .ssh]# ssh server3
Last login: Wed Jul 11 23:35:15 2018 from server4
[aaa@qq.com ~]# logout
Connection to server3 closed.

方法二:

四台机子都需要配置比较麻烦

aaa@qq.com ~]# ssh-****** -t rsa ##Enter 即可,选择默认方式
[aaa@qq.com ~]# ssh-copy-id -i .ssh/id_rsa.pub aaa@qq.com
[aaa@qq.com ~]# ssh-copy-id -i .ssh/id_rsa.pub aaa@qq.com
[aaa@qq.com ~]# ssh-copy-id -i .ssh/id_rsa.pub aaa@qq.com

三.MHA配置

manager节点配置:

(1)新建目录,并创建文件

aaa@qq.com masterha]# pwd
/etc/masterha    ##此目录为mkdir目录
[aaa@qq.com masterha]# vim app.cnf

编写内容:

[server default]
manager_log=/etc/masterha/mha.log
manager_workdir=/etc/masterha/
master_binlog_dir=/var/lib/mysql
#master_ip_online_change_script=/etc/masterha/master_ip_online_change
aaa@qq.com
ping_interval=1
remote_workdir=/tmp
repl_password=redhat
repl_user=wuyanzu
ssh_user=root
user=root

[server1]
hostname=172.25.9.1
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=172.25.9.2
port=3306

[server3]
hostname=172.25.9.3
port=3306

slave节点(server2和server3)配置:

server2和server3配置relay log的清除方式和slave配置只读,但不要写入配置文件,因为master机down掉后可能随时会升级成master。

aaa@qq.com mysql]# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.

mysql> set global relay_log_purge=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global read_only=on;
Query OK, 0 rows affected (0.00 sec)

manager节点检测ssh配置:

aaa@qq.com ~]# masterha_check_ssh --conf=/etc/masterha/app.cnf

Wed Aug 09 20:55:27 2018 - [info] All SSH connection tests passed successfully.

manager节点检测repl环境:

aaa@qq.com ~]# masterha_check_repl --conf=/etc/masterha/app.cnf 
Wed Aug 09 23:56:50 2018 - [warning] Global configuration file
MySQL Replication Health is NOT OK!


 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 172.25.9.2(172.25.9.2:3306) :1045:Access denied for user 'root'@'server4' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 00:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln301] Got MySQL error when connecting 172.25.9.1(172.25.254.1:3306) :1045:Access denied for user 'root'@'server4' (using password: YES), but this is not a MySQL crash. Check MySQL server settings.
 at /usr/share/perl5/vendor_perl/MHA/ServerManager.pm line 297
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln309] Got fatal error, stopping operations
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Thu Aug 09 10:00:50 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Thu Aug 09 10:00:50 2018 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

我们会发现这时候报错了,那么如何解决呢?

解决方法:master机(server1)给与监控权限,然后再manager机进行测试就会发现repl检测成功。

aaa@qq.com ~]# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.7.17-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

mysql> grant all on *.* to aaa@qq.com'172.25.9.%' identified by 'aaa@qq.com';
Query OK, 0 rows affected, 1 warning (0.13 sec)
aaa@qq.com ~]# masterha_check_repl --conf=/etc/masterha/app.cnf 
Thu Aug 09 23:04:51 2018 - [info] Checking replication health on 172.25.254.2..
Thu Aug 09 23:04:51 2018 - [info]  ok.
Thu Aug 09 23:04:51 2018 - [info] Checking replication health on 172.25.254.3..
Thu Aug 09 23:04:51 2018 - [info]  ok.
Thu Aug 09 23:04:51 2018 - [warning] master_ip_failover_script is not defined.
Thu Aug 09 23:04:51 2018 - [warning] shutdown_script is not defined.
Thu Aug 09 23:04:51 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

四.测试

manager机(server4)开启监控:

aaa@qq.com ~]# nohup masterha_manager --conf=/etc/masterha/app.cnf &
[1] 1201
[aaa@qq.com ~]# nohup: ignoring input and appending output to `nohup.out'

将master机(server1)的mysql down掉:

1527 pts/0    S      0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysq
 1772 pts/0    Sl     0:02 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/m
 1969 pts/0    R+     0:00 ps ax
[aaa@qq.com ~]# kill -9 1527
[aaa@qq.com ~]# kill -9 1772
或者
[aaa@qq.com ~]# killall -9 mysqld_safe ##这样就会直接跳过去
[aaa@qq.com ~]# killall -9 mysqld

查看manager机(server4)会自动生成日志等文件:

aaa@qq.com masterha]# ls
app.cnf  app.failover.complete  mha.log
[aaa@qq.com masterha]# cat mha.log
----- Failover Report -----

app: MySQL Master failover 172.25.9.1(172.25.9.1:3306) to 172.25.9.2(172.25.9.2:3306) succeeded

Master 172.25.9.1(172.25.9.1:3306) is down!

Check MHA Manager logs at server4:/etc/masterha/mha.log for details.

Started automated(non-interactive) failover.
Selected 172.25.9.2(172.25.9.2:3306) as a new master.
172.25.9.2(172.25.9.2:3306): OK: Applying all logs succeeded.
172.25.9.3(172.25.9.3:3306o_master=1): OK: Slave started, replicating from 172.25.9.2(172.25.9.2:3306)
172.25.254.2(172.25.254.2:3306): Resetting slave info succeeded.
Master failover to 172.25.254.2(172.25.254.2:3306) completed successfully.

日志文件中提示master已经由server2接管,此时我们在server2和server3分别查看,可知:

在server2上面查看,会有master的状态,没有slave状态,server3上面查看,master和slave状态都有。

这时我们将serve1的mysql开启,并将他手动设置为slave,指向新的master。

[aaa@qq.com ~]# mysql -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.17-log MySQL Community Server (GPL)

mysql> show slave status;
Empty set (0.00 sec)

mysql> change master to master_host='172.25.9.2',master_user='root',master_password='aaa@qq.com',master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.34 sec)

mysql> start slave;
Query OK, 0 rows affected (0.03 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.25.9.2
                  Master_User: root
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 710
               Relay_Log_File: server1-relay-bin.000002
                Relay_Log_Pos: 923
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

 

 

相关标签: mysql高可用 MHA