keepalived+双主实践
工作不怎么忙,搞点儿开发吧差点儿事,就想着弄点儿架构的事儿。正好前段时间看过关于keepalived+双主实现高可用的文章,也恰好身边的朋友所在的公司也部分用这个架构。没什么事儿就搞搞吧,正好对比下mmm、mha、keepalived+双主三种架构的优劣和db维护的体验感。简单讲讲自己的用户体验感,就搭建难易程度讲mmm的安装包封装好的,修改的配置文件较mha少一些,比keepalived+双主要稍微麻烦点儿。本着省事,维护起来省事还是觉得mmm真的便利,黑盒操作适合我这种懒人加小白类型。
某位大佬讲过,如果你的公司还在用mmm和mha,那么你可以考虑跳槽了。我觉得这句话很有道理,其实并不是让我们真的去跳槽,毕竟每个人工作的目的,环境不一样,有些架构上的事儿我们决定不了。没法随性而行,但不能停止探索的脚步,本过程从搭建调研/搭建过程/搭建测试/搭建总结四个方面讲述我对双主+keepalived的理解和用户体验感。
一、搭建调研
传统的高可用架构如mha、mmm存在一些不成熟的问题,如脑裂。引入keepalived和双主复制模式,实现高可用架构,但keepalived本身是在机器宕机时才会实现漂移功能,我们的目标是要mysql实例宕机后要实现故障切换,还需要辅助的脚本来帮助keepalived来实现更灵活的漂移。
keepalived简介
keepalived是集群管理中保证集群高可用的一个软件解决方案,其功能类似于heartbeat,用来防止单点故障,这里的作用我理解其实就是保证vip的顺利漂移。虚拟路由冗余协议,可以认为是实现路由器高可用的协议,即将n台提供相同功能的路由器组成一个路由器组,这个组里面有一个master和多个backup,master上面有一个对外提供服务的vip,master会发组播(组播地址为224.0.0.18),当backup收不到vrrp包时就认为master宕掉了,这时就需要根据vrrp的优先级来选举一个backup当master,这样的话就可以保证路由器的高可用了。
keepalived配置说明
keepalived只有一个配置文件keepalived.conf,里面主要包括以下几个配置区域,分别是global_defs、vrrp_instance和virtual_server。
- global_defs:主要是配置故障发生时的通知对象以及机器标识;
- vrrp_instance:用来定义对外提供服务的vip区域及其相关属性;
- virtual_server:虚拟服务器定义。
二、搭建过程
搭建环境(服务器配置忽略)
master1:172.16.3.190/22 3309 vip:172.16.3.123/22
1 #master1配置keepalived 2 yum install keepalived.x86_64 3 [root@172-16-3-190 we_ops_admin]# cat /etc/keepalived/keepalived.conf 4 ! configuration file for keepalived 5 6 global_defs { 7 router_id lvs_master1 8 } 9 10 vrrp_instance vi_1 { 11 state backup 12 interface eth0 13 virtual_router_id 172 14 priority 100 15 advert_int 1 16 nopreempt 17 authentication { 18 auth_type pass 19 auth_pass 1111 20 } 21 virtual_ipaddress { 22 172.16.3.123/22 23 } 24 } 25 26 virtual_server 172.16.3.123 3309 { 27 delay_loop 6 28 lb_algo rr 29 lb_kind nat 30 nat_mask 255.255.255.0 31 persistence_timeout 50 32 protocol tcp 33 34 real_server 172.16.3.190 3309 { 35 weight 3 36 notify_down /opt/shells/keepalived_mysql.sh 37 tcp_check { 38 connect_timeout 3 39 nb_get_retry 3 40 delay_before_retry 3 41 connect_port 3309 42 } 43 } 44 } 45 46 #master2上安装keepalived 47 yum install keepalived.x86_64 48 [root@172-16-3-189 we_ops_admin]# cat /etc/keepalived/keepalived.conf 49 ! configuration file for keepalived 50 51 global_defs { 52 router_id lvs_master2 53 } 54 55 vrrp_instance vi_1 { 56 state backup 57 interface eth0 58 virtual_router_id 172 59 priority 50 60 advert_int 1 61 # nopreempt 62 authentication { 63 auth_type pass 64 auth_pass 1111 65 } 66 virtual_ipaddress { 67 172.16.3.123/22 68 } 69 } 70 71 virtual_server 172.16.3.123 3309 { 72 delay_loop 6 73 lb_algo rr 74 lb_kind nat 75 nat_mask 255.255.255.0 76 persistence_timeout 50 77 protocol tcp 78 79 real_server 172.16.3.189 3309 { 80 weight 3 81 notify_down /opt/shells/keepalived_mysql.sh 82 tcp_check { 83 connect_timeout 3 84 nb_get_retry 3 85 delay_before_retry 3 86 connect_port 3309 87 } 88 } 89 }
上述配置中我们可以保证keepalived服务对vip:172.16.3.123/22的控制权,默认是keepalived服务关闭,那么会触发vip的漂移。正常运行的服务不会发生异常停止的现象,如果系统发生宕机会触发所有的服务停止,这里系统宕机是触发vip漂移的导火索。只是这里我们想让keepalived服务于mysql复制集,那么这里的导火索自然而然是mysql服务的状态。如果服务状态不可用,那么我们希望这个应用vip可以漂移到复制集的另一台机器上;如果服务状态可用,我们希望vip不要漂移。要想实现这个目的,我们还需要一个服务脚本来帮助我们去帮助keepalived发现mysql服务宕机后的动作,脚本如下配置。
1 [root@172-16-3-190 we_ops_admin]# cat /opt/shells/keepalived_mysql.sh 2 #!/bin/bash 3 pkill keepalived 4 /sbin/ifdown eth0 && /sbin/ifup eth0 5 #授予可执行权限 6 [root@172-16-3-190 we_ops_admin]# ls -lh /opt/shells/keepalived_mysql.sh 7 -rwxr-xr-x 1 root root 66 sep 27 19:29 /opt/shells/keepalived_mysql.sh
通过步骤1·2的配置,启动mysql服务,启动keepalived服务,这里的master1和master2基本就可以实现高可用,保证了master1服务不可用时,master2还能继续提供数据库的支持。
三、搭建测试
1、master1的mysql服务宕机,vip会从master1上摘除漂移落盘到master2上,且master1上的keepalived服务也会停止。应用连接vip,master2继续为整个集群提供数据库支持。
1 #停止master1上的mysql服务 2 [root@172-16-3-190 we_ops_admin]# /etc/init.d/mysql_3309 stop 3 shutting down mysql (percona server).. success! 4 5 #keepalived服务也停止了,且vip已经被从master1上摘除 6 [root@172-16-3-190 we_ops_admin]# /etc/init.d/keepalived status 7 keepalived dead but subsys locked 8 [root@172-16-3-190 we_ops_admin]# ip add 9 1: lo: <loopback,up,lower_up> mtu 65536 qdisc noqueue state unknown 10 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 11 inet 127.0.0.1/8 scope host lo 12 inet6 ::1/128 scope host 13 valid_lft forever preferred_lft forever 14 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000 15 link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff 16 inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0 17 inet6 fe80::5054:ff:fef4:ecb2/64 scope link 18 valid_lft forever preferred_lft forever 19 #vip漂移到master2上 20 [root@172-16-3-189 we_ops_admin]# ip add 21 1: lo: <loopback,up,lower_up> mtu 65536 qdisc noqueue state unknown 22 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 23 inet 127.0.0.1/8 scope host lo 24 inet6 ::1/128 scope host 25 valid_lft forever preferred_lft forever 26 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000 27 link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff 28 inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0 29 inet 172.16.3.123/22 scope global secondary eth0 30 inet6 fe80::5054:ff:fe2d:965c/64 scope link 31 valid_lft forever preferred_lft forever
2、master1重新加入集群,vip不会重新漂移回来,造成二次波动或者脑裂现象
1 #重启master1上的mysql服务 2 [root@172-16-3-190 we_ops_admin]# /etc/init.d/mysql_3309 start 3 starting mysql (percona server)............... success! 4 #重启master1上的keepalived服务 5 [root@172-16-3-190 we_ops_admin]# /etc/init.d/keepalived start 6 starting keepalived: [ ok ] 7 #vip还是在master2上,且master1上并没有vip,因为master1上设置非抢占模式,即使优先级更高 8 [root@172-16-3-190 we_ops_admin]# ip add #master1 9 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000 10 link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff 11 inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0 12 inet6 fe80::5054:ff:fef4:ecb2/64 scope link 13 valid_lft forever preferred_lft forever 14 15 [root@172-16-3-189 we_ops_admin]# ip add master2 16 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000 17 link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff 18 inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0 19 inet 172.16.3.123/22 scope global secondary eth0 20 inet6 fe80::5054:ff:fe2d:965c/64 scope link 21 valid_lft forever preferred_lft forever
1 #关闭master2实例 2 [root@172-16-3-189 we_ops_admin]# /etc/init.d/mysql_3309 stop 3 shutting down mysql (percona server).. success! 4 [root@172-16-3-189 we_ops_admin]# /etc/init.d/keepalived status 5 keepalived dead but subsys locked 6 #vip已经从master2上飘走了 7 [root@172-16-3-189 we_ops_admin]# ip add 8 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000 9 link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff 10 inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0 11 inet6 fe80::5054:ff:fe2d:965c/64 scope link 12 valid_lft forever preferred_lft forever 13 14 #vip已经落盘到master1上 15 [root@172-16-3-190 we_ops_admin]# ip add 16 2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000 17 link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff 18 inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0 19 inet 172.16.3.123/22 scope global secondary eth0 20 inet6 fe80::5054:ff:fef4:ecb2/64 scope link 21 valid_lft forever preferred_lft forever 22 #server-id可以证明连接到master1实例 23 [root@172-16-3-190 we_ops_admin]# /opt/app/mysql_3309/bin/mysql -urepl -prepl --socket=/opt/app/mysql_3309/tmp/mysql.sock --port=3309 --host=172.16.3.123 24 warning: using a password on the command line interface can be insecure. 25 welcome to the mysql monitor. commands end with ; or \g. 26 your mysql connection id is 33 27 server version: 5.6.20-68.0-log percona server (gpl), release 68.0, revision 656 28 29 copyright (c) 2009-2014 percona llc and/or its affiliates 30 copyright (c) 2000, 2014, oracle and/or its affiliates. all rights reserved. 31 32 oracle is a registered trademark of oracle corporation and/or its 33 affiliates. other names may be trademarks of their respective 34 owners. 35 36 type 'help;' or '\h' for help. type '\c' to clear the current input statement. 37 38 mysql> show global variables like '%server_id%'; 39 +----------------+---------+ 40 | variable_name | value | 41 +----------------+---------+ 42 | server_id | 1903309 | 43 | server_id_bits | 32 | 44 +----------------+---------+ 45 2 rows in set (0.01 sec)
[root@172-16-3-189 we_ops_admin]#/etc/init.d/mysql_3309 stop #master1上停止实例
shutting down mysql (percona server).. success!
[root@172-16-3-189 we_ops_admin]#/etc/init.d/keepalived status
keepalived dead but subsys locked
[root@172-16-3-189 we_ops_admin]#ip add #vip居然还在master2上
2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000
link/ether 52:54:00:2d:96:5c brd ff:ff:ff:ff:ff:ff
inet 172.16.3.189/22 brd 172.16.3.255 scope global eth0
inet 172.16.3.123/22 scope global secondary eth0
inet6 fe80::5054:ff:fe2d:965c/64 scope link
valid_lft forever preferred_lft forever
2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc htb state up qlen 1000
link/ether 52:54:00:f4:ec:b2 brd ff:ff:ff:ff:ff:ff
inet 172.16.3.190/22 brd 172.16.3.255 scope global eth0
inet 172.16.3.123/22 scope global secondary eth0
inet6 fe80::5054:ff:fef4:ecb2/64 scope link
valid_lft forever preferred_lft forever
#master2上有vip,但应用没有连接到master2上且表的行数不增长
mysql> select max(id) from test_keepalived;
+---------+
| max(id) |
+---------+
| 168 |
+---------+
1 row in set (0.00 sec)
+---------+
| max(id) |
+---------+
| 168 |
+---------+
1 row in set (0.00 sec)
mysql> select max(id) from test_keepalived;
+---------+
| max(id) |
+---------+
| 387 |
+---------+
1 row in set (0.00 sec)
+---------+
| max(id) |
+---------+
| 388 |
+---------+
1 row in set (0.00 sec)
2、master2同步被中断的问题。(vip在maste2上时,因为master2上已经写入了数据但没来得及同步到master1上;master2实例停止后,vip也漂移到master1,应用连接master1进行写入,但因为表设计为主键自增长,会出现id为25已写入master2而没有同步到master1,应用连接master1写入到数据库同步到master2时报主键重复)
1 mysql> show slave status \g; 2 *************************** 1. row *************************** 3 slave_io_state: waiting for master to send event 4 master_host: 172.16.3.190 5 master_user: repl 6 master_port: 3309 7 connect_retry: 30 8 master_log_file: binlog.000036 9 read_master_log_pos: 103620 10 relay_log_file: relay_bin.000038 11 relay_log_pos: 280 12 relay_master_log_file: binlog.000036 13 slave_io_running: yes 14 slave_sql_running: no 15 replicate_do_db: 16 replicate_ignore_db: 17 replicate_do_table: 18 replicate_ignore_table: 19 replicate_wild_do_table: 20 replicate_wild_ignore_table: 21 last_errno: 1062 22 last_error: error 'duplicate entry '25' for key 'primary'' on query. default database: 'practice'. query: 'insert into test_keepalived values(null,1,4)' 23 skip_counter: 0 24 exec_master_log_pos: 120 25 relay_log_space: 104434 26 until_condition: none 27 until_log_file: 28 until_log_pos: 0 29 master_ssl_allowed: no 30 master_ssl_ca_file: 31 master_ssl_ca_path: 32 master_ssl_cert: 33 master_ssl_cipher: 34 master_ssl_key: 35 seconds_behind_master: null 36 master_ssl_verify_server_cert: no 37 last_io_errno: 0 38 last_io_error: 39 last_sql_errno: 1062 40 last_sql_error: error 'duplicate entry '25' for key 'primary'' on query. default database: 'practice'. query: 'insert into test_keepalived values(null,1,4)' 41 replicate_ignore_server_ids: 42 master_server_id: 1903309 43 master_uuid: 1b589d80-f450-11e7-9150-525400f4ecb2 44 master_info_file: /opt/app/mysql_3309/logs/master.info 45 sql_delay: 0 46 sql_remaining_delay: null 47 slave_sql_running_state: 48 master_retry_count: 86400 49 master_bind: 50 last_io_error_timestamp: 51 last_sql_error_timestamp: 180929 17:43:30 52 master_ssl_crl: 53 master_ssl_crlpath: 54 retrieved_gtid_set: 55 executed_gtid_set: 56 auto_position: 0 57 1 row in set (0.00 sec)
keepalived+双主架构总结