欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

K8S kubeproxy转发分析

程序员文章站 2022-04-25 15:16:52
...

环境信息

节点(node)IP:192.168.0.11
服务配置:3副本Nginx服务
服务CLUSTER-IP:10.254.198.92
服务CLUSTER PORT:80
服务NodePort:32110

如何处理访问Service的流量?

步骤1 将流量导入KUBE-SERVICES链

k8s创建的服务对外提供NodePort或ClusterIP的访问方式,而真正负责服务的是内部各pod(如172.16.0.2,172.16.0.3,172.16.0.4),kube-proxy就是负责外部与内部的转发工作,在使用IPTABLES做转发的模式下,nat表中KUBE-SERVICES链负责该工作,后续详述该链内容,首先分析下如何将访问Service的流量导入KUBE-SERVICES链。

本机通过NodePort或者ClusterIP访问service,经过IPTABLES的主要表、链如下:

NAT OUTPUT
FILTER OUTPUT
NAT POSTROUTING

外部通过NodePort访问service,经过IPTABLES的主要表、链如下:

NAT PREROUTING
FILTER FORWARD
NAT POSTROUTING

分析:
以上两类访问方式流量会分别经过NAT的OUTPUT链和PREROUTING 链,所以可以在这两处将流量截获并转发至KUBE-SERVICES链。

验证:
NAT OUTPUT 链配置:

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
LOG        all  --  0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 4 prefix "** NAT OUTPUT **"
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
...

NAT PREROUTING 链配置:

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
LOG        all  --  0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 4 prefix "** NAT PREROUTING **"
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
...

步骤二 KUBE-SERVICES 链进行流量转发

(1)将访问ClusterIP(10.254.198.92:80)和NodePort的流量分成两类处理,以下两条规则分别匹配
ClusterIP和NodePort的流量。

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
...
KUBE-SVC-I64SNEMOLCWHJHS3  tcp  --  0.0.0.0/0            10.254.198.92        /* default/nginx-service-nodeport: cluster IP */ tcp dpt:80
KUBE-NODEPORTS  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

(2)访问ClusterIP的流量进一步处理,最终实现分配给后端pods。

Chain KUBE-SVC-I64SNEMOLCWHJHS3 (2 references)
target     prot opt source               destination         
KUBE-SEP-MMWJ6M2J72TU3J64  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.33332999982
KUBE-SEP-GRLEVIWNO4P37GSQ  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.50000000000
KUBE-SEP-74XRUOWV76LDS3ID  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */

分析:后端有3个pod,以上规则中通过random算法将流量分发,由随机数可以看出并不是平均分配,接下来进一步查看其中1个pod子链的规则。

Chain KUBE-SEP-MMWJ6M2J72TU3J64 (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  172.17.0.2           0.0.0.0/0            /* default/nginx-service-nodeport: */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ tcp to:172.17.0.2:80

分析:通过DNAT规则可以看出,将流量转发到了POD(172.17.0.2:80)中,其他两条也是类似配置。

(3)访问NodePort的流量进一步处理,最终实现分配给后端pods。

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ tcp dpt:32110
KUBE-SVC-I64SNEMOLCWHJHS3  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ tcp dpt:32110

分析:
第一条规则(KUBE-MARK-MASQ)是对流量进行了标记(MARK or 0x4000),返回后继续执行第二条规则。
第二条规则KUBE-SVC-I64SNEMOLCWHJHS3与上面分析的ClusterIP经过的链相同,即进一步分配给后端pod:

Chain KUBE-SVC-I64SNEMOLCWHJHS3 (2 references)
target     prot opt source               destination         
KUBE-SEP-MMWJ6M2J72TU3J64  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.33332999982
KUBE-SEP-GRLEVIWNO4P37GSQ  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.50000000000
KUBE-SEP-74XRUOWV76LDS3ID  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */

整体流程架构图

K8S kubeproxy转发分析

验证

验证通过查看IPTABLES日志分析流量路径及数据包变化,IPTABLES日志可以通过以下命令向特定链中添加:

[aaa@qq.com ~]# iptables -I KUBE-NODEPORTS -s 192.168.0.0/16 -j LOG --log-prefix "** NAT KUBE-NODEPORTS **" -t nat 

本机NodePort访问

curl 192.168.0.11:32110

Jan 21 17:03:39 hjdevelop kernel: ** NAT OUTPUT **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SERVICES **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-NODEPORTS **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-MARK-MASQ  **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SVC **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SEP-MMWJ6M2J72TU3IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:03:39 hjdevelop kernel: ** Filter OUTPUT **IN= OUT=lo SRC=192.168.0.11 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=80 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:03:39 hjdevelop kernel: ** NAT POSTROUTING **IN= OUT=docker0 SRC=192.168.0.11 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=80 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 

通过IPTABLES日志可以确认顺序为:
OUTPUT–>KUBE-SERVICES–>KUBE-NODEPORTS–>NAT KUBE-MARK-MASQ–>KUBE-SVC–>KUBE-SEP-MMWJ6M2J72TU3IN(该流程将目标IP修改为172.17.0.2)–>Filter OUTPUT–>NAT POSTROUTING

ClusterIP访问

Jan 21 17:08:36 hjdevelop kernel: ** NAT OUTPUT **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SERVICES **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SVC **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SEP-74XRUOWV76LDSIN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** Filter OUTPUT **IN= OUT=eth0 SRC=192.168.0.11 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT POSTROUTING **IN= OUT=docker0 SRC=192.168.0.11 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 

通过IPTABLES日志可以确认顺序为:
OUTPUT–>KUBE-SERVICES–>KUBE-SVC–>KUBE-SEP-74XRUOWV76LDSIN(该流程将目标IP修改为172.17.0.4)–>Filter OUTPUT–>NAT POSTROUTING

外网NodePort访问

Jan 21 17:12:34 hjdevelop kernel: ** NAT PREROUTING **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SERVICES **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-NODEPORTS **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-MARK-MASQ  **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SVC **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SEP-74XRUOWV76LDSIN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:12:34 hjdevelop kernel: ** FILTER FORWARD **IN=eth0 OUT=docker0 MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=172.17.0.4 LEN=52 TOS=0x00 PREC=0x00 TTL=115 ID=21602 DF PROTO=TCP SPT=60079 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000

通过IPTABLES日志可以确认顺序为:
PREROUTING–>KUBE-SERVICES–>KUBE-NODEPORTS–>KUBE-MARK-MASQ–>KUBE-SVC–>KUBE-SEP-74XRUOWV76LDSIN(该流程将目标IP修改为172.17.0.4)–>Filter FORWARD

相关标签: 容器