redis cluster 节点操作

程序员文章站 2022-07-04 09:08:16

...

上一篇文章，我们已经学会了redis cluster的搭建及相关配置信息，本篇文章就重点学习一下redis cluster在运行过程中相关节点的操作。

目前我们已经搭建好的集群如下所示：

redis cluster 节点操作

当初始规模（6节点）集群不够用时，我们会进行节点增加，而有时节点数量太多时，也会进行删减，下面就来看一看redis cluster对节点数量变动的支持。

此时slot分配情况如下：

redis cluster 节点操作

1.添加数据

添加节点前，我们先set一些测试数据，目的是查看节点变更后，这部分数据会不会自动同步过去。

添加的数据为：key0-->key99999，共10万个。

然后使用dbsize命令统计各节点数据分布情况，如下：

192.168.80.129:6379> dbsize
(integer) 33320

192.168.80.129:6380> dbsize
(integer) 33390

192.168.80.129:6381> dbsize
(integer) 33290

3个节点相加为10000，如果不是那就说明集群有问题：）

此时数据分布情况如下：

redis cluster 节点操作
2.增加节点

1）添加主节点

首先我们查看一下集群信息，redis命令行状态下输入（以6382节点为例）：

192.168.80.129:6382> CLUSTER INFO
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:7
cluster_my_epoch:7
cluster_stats_messages_sent:10288
cluster_stats_messages_received:10286

添加节点可以使用如下命令：

./redis-trib.rb add-node 新节点IP:新节点端口 旧节点IP:旧节点端口
或
./redis-cli --cluster add-node 新节点IP:新节点端口 旧节点IP:旧节点端口

第一个参数很明确，而第二个参数就有些迷惑了，旧节点指的是那个节点？master还是slave？

这次我们准备添加2个节点，分别为redis-6385与redis-6386，于是我们先输入如下命令进行尝试：

[admin@localhost bin]$ ./redis-trib.rb add-node 192.168.80.129:6385 192.168.80.129:6382
>>> Adding node 192.168.80.129:6385 to cluster 192.168.80.129:6382
>>> Performing Cluster Check (using node 192.168.80.129:6382)
M: ef8de3f336e23da1e703719621ac878cb0ac2e40 192.168.80.129:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 0779cac46b6e8b0908ca16feb2bb28f916348eff 192.168.80.129:6383
   slots: (0 slots) slave
   replicates 7812b87e4c22ad604869a4350b32911eb9ef5865
M: 98017cd8a46aee30e6cc3222fa1657118f1eeec2 192.168.80.129:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 5cee1d472a3d956ae56332b8a30b05520b8893ea 192.168.80.129:6382
   slots: (0 slots) slave
   replicates ef8de3f336e23da1e703719621ac878cb0ac2e40
S: 79ed8fc747c0c02ee8b7318d83f96d6fa7d5ffa5 192.168.80.129:6384
   slots: (0 slots) slave
   replicates 98017cd8a46aee30e6cc3222fa1657118f1eeec2
M: 7812b87e4c22ad604869a4350b32911eb9ef5865 192.168.80.129:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.80.129:6385 to make it join the cluster.
[OK] New node added correctly.

我们发现添加成功了，那么第二个参数也就明确了，指的是集群中的任意一个节点即可，因为都可以获取到集群的信息。

而此时新添加的6385节点并没有出现在集群信息当中，通过CLUSTER INFO命令就可以看到，集群cluster_size:3。

补充，redis-trib.rb有以下参数可以使用：

create：创建一个集群环境host1:port1 ... hostN:portN（集群中的主从节点比例）

例如：./redis-trib.rb create --replicas 1 192.168.80.129:6379 192.168.80.129:6380 192.168.80.129:6381 192.168.80.129:6382 192.168.80.129:6383 192.168.80.129:6384

call：可以执行redis命令

例如：./redis-trib.rb call 192.168.80.129:6379 cluster info

add-node：将节点添加到集群里，第一个参数为要添加节点的IP与端口，第二个参数为集群中任意一个已经存在的节点的IP与端口

例如：./redis-trib.rb add-node 192.168.80.129:6385 192.168.80.129:6379

del-node：移除一个节点

例如：./redis-trib.rb del-node 192.168.80.129:6386 13ed528b7f45bfe03d6728d9dd3bc34a38d6cf75

reshard：重新分片

例如：./redis-trib.rb reshard 192.168.80.129:6379

check：检查集群状态

例如：./redis-trib.rb check 192.168.80.129:6379

2）添加从节点

虽然上面已经添加了master节点6385，从6385的node配置中也看到了，接下来我们继续添加6385的从节点，命令如下：

[admin@localhost bin]$ ./redis-trib.rb add-node --slave --master-id 3a1d7eadcc99f296ca76ef7a687184fec9dee782 192.168.80.129:6386 192.168.80.129:6379
>>> Adding node 192.168.80.129:6386 to cluster 192.168.80.129:6379
>>> Performing Cluster Check (using node 192.168.80.129:6379)
M: ef8de3f336e23da1e703719621ac878cb0ac2e40 192.168.80.129:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 0779cac46b6e8b0908ca16feb2bb28f916348eff 192.168.80.129:6383
   slots: (0 slots) slave
   replicates 7812b87e4c22ad604869a4350b32911eb9ef5865
M: 98017cd8a46aee30e6cc3222fa1657118f1eeec2 192.168.80.129:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 3a1d7eadcc99f296ca76ef7a687184fec9dee782 192.168.80.129:6385
   slots: (0 slots) master
   0 additional replica(s)
S: 5cee1d472a3d956ae56332b8a30b05520b8893ea 192.168.80.129:6382
   slots: (0 slots) slave
   replicates ef8de3f336e23da1e703719621ac878cb0ac2e40
S: 79ed8fc747c0c02ee8b7318d83f96d6fa7d5ffa5 192.168.80.129:6384
   slots: (0 slots) slave
   replicates 98017cd8a46aee30e6cc3222fa1657118f1eeec2
M: 7812b87e4c22ad604869a4350b32911eb9ef5865 192.168.80.129:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.80.129:6386 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of 192.168.80.129:6385.
[OK] New node added correctly.

至于从哪里获取6385节点的ID，我就不说了。

这时我们惊奇的发现，此时集群中cluster_size依然为3，这就说明虽然两个节点已经加入集群，当应该还未正常工作。

3.重新分配slot

因为我们没有给新节点分配slot的关系，所以它还不能正常工作。

redis-6379与redis-6381的slot数量为5461个，redis-6380的slot数量为5462个，这里需要注意。

因为此时集群中已经有4个mster节点，并且16384/4=4096，正好可以平均分配，所以就需要从6379与6381节点中拆分出1365个slot，从6380节点中拆分出1366个slot，如下所示：

redis cluster 节点操作

接下来输入reshard命令并根据提示执行，如下所示：

[admin@localhost bin]$ ./redis-trib.rb reshard 192.168.80.129:6379
>>> Performing Cluster Check (using node 192.168.80.129:6379)
M: ef8de3f336e23da1e703719621ac878cb0ac2e40 192.168.80.129:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 0779cac46b6e8b0908ca16feb2bb28f916348eff 192.168.80.129:6383
   slots: (0 slots) slave
   replicates 7812b87e4c22ad604869a4350b32911eb9ef5865
M: 98017cd8a46aee30e6cc3222fa1657118f1eeec2 192.168.80.129:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 13ed528b7f45bfe03d6728d9dd3bc34a38d6cf75 192.168.80.129:6386
   slots: (0 slots) slave
   replicates 3a1d7eadcc99f296ca76ef7a687184fec9dee782
M: 3a1d7eadcc99f296ca76ef7a687184fec9dee782 192.168.80.129:6385
   slots: (0 slots) master
   1 additional replica(s)
S: 5cee1d472a3d956ae56332b8a30b05520b8893ea 192.168.80.129:6382
   slots: (0 slots) slave
   replicates ef8de3f336e23da1e703719621ac878cb0ac2e40
S: 79ed8fc747c0c02ee8b7318d83f96d6fa7d5ffa5 192.168.80.129:6384
   slots: (0 slots) slave
   replicates 98017cd8a46aee30e6cc3222fa1657118f1eeec2
M: 7812b87e4c22ad604869a4350b32911eb9ef5865 192.168.80.129:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1365（这里输入要调整的slot数量）
What is the receiving node ID? 3a1d7eadcc99f296ca76ef7a687184fec9dee782（这里输入要接受slot的node ID，此处为6385节点的ID）
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:ef8de3f336e23da1e703719621ac878cb0ac2e40（从哪个节点拆分，此处为6379节点的ID）
Source node #2:98017cd8a46aee30e6cc3222fa1657118f1eeec2（从哪个节点拆分，此处为6381节点的ID）
Source node #3:done      （开始执行）
    开始打印
    ........
    Moving slot 11593 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11594 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11595 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11596 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11597 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11598 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11599 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11600 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11601 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11602 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11603 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
    Moving slot 11604 from 98017cd8a46aee30e6cc3222fa1657118f1eeec2
Do you want to proceed with the proposed reshard plan (yes/no)?yes（输入yes确认）
开始打印
......
Moving slot 11598 from 192.168.80.129:6381 to 192.168.80.129:6385: .....
Moving slot 11599 from 192.168.80.129:6381 to 192.168.80.129:6385: .........
Moving slot 11600 from 192.168.80.129:6381 to 192.168.80.129:6385: .........
Moving slot 11601 from 192.168.80.129:6381 to 192.168.80.129:6385: ....
Moving slot 11602 from 192.168.80.129:6381 to 192.168.80.129:6385: ........
Moving slot 11603 from 192.168.80.129:6381 to 192.168.80.129:6385: ...
Moving slot 11604 from 192.168.80.129:6381 to 192.168.80.129:6385: ........
至此，6379节点与6381节点的slot已经reshard完成。

6380节点的reshard操作我就不再赘述了，都拆分完成后，我们利用redis-trib.rb来查看一下redis中数据的分布情况，如下所示：

[admin@localhost bin]$ ./redis-trib.rb call 192.168.80.129:6379 dbsize
>>> Calling DBSIZE
192.168.80.129:6379: 29181
192.168.80.129:6380: 25088
192.168.80.129:6381: 29130
192.168.80.129:6385: 16601

29181+25088+29130+16601=100000！

我们发现reshard操作并不会丢失数据，就像直接把slot移动过来一般。

然后我们查看一下此时的的集群信息，如下：

[admin@localhost bin]$ ./redis-cli -c -p 6379 -h 192.168.80.129
192.168.80.129:6379> cluster info
cluster_state:fail
cluster_slots_assigned:16384
cluster_slots_ok:13653
cluster_slots_pfail:0
cluster_slots_fail:2731
cluster_known_nodes:8
cluster_size:4
cluster_current_epoch:8
cluster_my_epoch:1
cluster_stats_messages_sent:98899
cluster_stats_messages_received:98453

cluster_size已经是4，证明最新的节点已经添加成功。

4.删除nodes

1）删除从节点

删除从节点的工作比较简单，因为我们不需要考虑数据问题，只需要执行以下命令即可：

[admin@localhost bin]$ ./redis-trib.rb del-node 192.168.80.129:6386 13ed528b7f45bfe03d6728d9dd3bc34a38d6cf75
>>> Removing node 13ed528b7f45bfe03d6728d9dd3bc34a38d6cf75 from cluster 192.168.80.129:6386
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

很轻松的就把6386这个节点从集群中删除掉了。

注意：删除节点后该节点对应的进程也会被终止，如下：

redis cluster 节点操作列表中已经没有了6386节点的进程。

2）删除从节点

删除之前我们需要考虑当前节点的数据转移问题，那么redis会帮我们自动转移吗？

先直接按照删除从节点的方法直接删除主节点，如下：

[admin@localhost bin]$ ./redis-trib.rb del-node 192.168.80.129:6385 3a1d7eadcc99f296ca76ef7a687184fec9dee782
>>> Removing node 3a1d7eadcc99f296ca76ef7a687184fec9dee782 from cluster 192.168.80.129:6385
[ERR] Node 192.168.80.129:6385 is not empty! Reshard data away and try again.

发现删除失败，提示我们6385节点内还存在数据，需要我们reshard后再尝试。

于是我们reshard一下6385节点，操作如下：

[admin@localhost bin]$ ./redis-trib.rb reshard 192.168.80.129:6385
>>> Performing Cluster Check (using node 192.168.80.129:6385)
M: 3a1d7eadcc99f296ca76ef7a687184fec9dee782 192.168.80.129:6385
   slots:0-682,5461-6826,10923-11604 (2731 slots) master
   0 additional replica(s)
S: 0779cac46b6e8b0908ca16feb2bb28f916348eff 192.168.80.129:6383
   slots: (0 slots) slave
   replicates 7812b87e4c22ad604869a4350b32911eb9ef5865
M: 98017cd8a46aee30e6cc3222fa1657118f1eeec2 192.168.80.129:6381
   slots:11605-16383 (4779 slots) master
   1 additional replica(s)
S: 79ed8fc747c0c02ee8b7318d83f96d6fa7d5ffa5 192.168.80.129:6384
   slots: (0 slots) slave
   replicates 98017cd8a46aee30e6cc3222fa1657118f1eeec2
M: 7812b87e4c22ad604869a4350b32911eb9ef5865 192.168.80.129:6380
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
S: 5cee1d472a3d956ae56332b8a30b05520b8893ea 192.168.80.129:6382
   slots: (0 slots) slave
   replicates ef8de3f336e23da1e703719621ac878cb0ac2e40
M: ef8de3f336e23da1e703719621ac878cb0ac2e40 192.168.80.129:6379
   slots:683-5460 (4778 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 2731（输入要移动的slot数，取上文列出的即可）
What is the receiving node ID? 7812b87e4c22ad604869a4350b32911eb9ef5865（选择谁接收，这里选择了6380节点）
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:3a1d7eadcc99f296ca76ef7a687184fec9dee782（来源，即从谁哪里移动）
Source node #2:done（确认执行）
    ......
    ......
    Moving slot 11596 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11597 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11598 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11599 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11600 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11601 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11602 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11603 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
    Moving slot 11604 from 3a1d7eadcc99f296ca76ef7a687184fec9dee782
Do you want to proceed with the proposed reshard plan (yes/no)? yes
    ......
    ......
Moving slot 11602 from 192.168.80.129:6385 to 192.168.80.129:6380: ........
Moving slot 11603 from 192.168.80.129:6385 to 192.168.80.129:6380: ...
Moving slot 11604 from 192.168.80.129:6385 to 192.168.80.129:6380: ........

reshard成功，接下来我们验证以下，执行cluster slots命令，如下：

192.168.80.129:6379> CLUSTER SLOTS
1) 1) (integer) 11605
   2) (integer) 16383
   3) 1) "192.168.80.129"
      2) (integer) 6381
      3) "98017cd8a46aee30e6cc3222fa1657118f1eeec2"
   4) 1) "192.168.80.129"
      2) (integer) 6384
      3) "79ed8fc747c0c02ee8b7318d83f96d6fa7d5ffa5"
2) 1) (integer) 683
   2) (integer) 5460
   3) 1) "192.168.80.129"
      2) (integer) 6379
      3) "ef8de3f336e23da1e703719621ac878cb0ac2e40"
   4) 1) "192.168.80.129"
      2) (integer) 6382
      3) "5cee1d472a3d956ae56332b8a30b05520b8893ea"
3) 1) (integer) 0
   2) (integer) 682
   3) 1) "192.168.80.129"
      2) (integer) 6380
      3) "7812b87e4c22ad604869a4350b32911eb9ef5865"
   4) 1) "192.168.80.129"
      2) (integer) 6383
      3) "0779cac46b6e8b0908ca16feb2bb28f916348eff"
4) 1) (integer) 5461
   2) (integer) 11604
   3) 1) "192.168.80.129"
      2) (integer) 6380
      3) "7812b87e4c22ad604869a4350b32911eb9ef5865"
   4) 1) "192.168.80.129"
      2) (integer) 6383
      3) "0779cac46b6e8b0908ca16feb2bb28f916348eff"

果然列表中已经没有了6385节点的slots，而这部分slot已经移动到了6380节点上。

最后我们重新调用删除命令，进行6385节点的删除，如下：

[admin@localhost bin]$ ./redis-trib.rb del-node 192.168.80.129:6385 3a1d7eadcc99f296ca76ef7a687184fec9dee782
>>> Removing node 3a1d7eadcc99f296ca76ef7a687184fec9dee782 from cluster 192.168.80.129:6385
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

删除成功，并且进程也被终止。

下一篇我们来看看redis集群中内部通信与选举过程。