欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

kubernetes-双网卡下,coredns,dashbord,metrics-server不能访问kube-apiserver

程序员文章站 2022-03-07 13:24:42
...

主机网络环境:

  公网IP 私网IP 网关
master 192.168.5.120 10.2.2.120 192.168.5.1
node1 192.168.5.121 10.2.2.121 192.168.5.1
node2 192.168.5.122 10.2.2.122 192.168.5.1

k8s版本:v1.13.3

安装方式:

参考:https://github.com/gjmzj/kubeasz/releases/tag/1.0.0rc1 

为了安全,把各个服务绑定在内网段(10.2.2.0/24)

所以hosts配置为:

# cat hosts 
# 集群部署节点:一般为运行ansible 脚本的节点
# 变量 NTP_ENABLED (=yes/no) 设置集群是否安装 chrony 时间同步
[deploy]
10.2.2.120 NTP_ENABLED=no

# etcd集群请提供如下NODE_NAME,请注意etcd集群必须是1,3,5,7...奇数个节点
[etcd]
10.2.2.120 NODE_NAME=etcd1

[kube-master]
10.2.2.120

[kube-node]
10.2.2.121
10.2.2.122

......

安装完成后,查看coredns,dashbord,metrics-server 的日志:

# kubectl get po -o wide --all-namespaces=true
NAME                                    READY   STATUS             RESTARTS   AGE     IP           NODE         NOMINATED NODE   READINESS GATES
coredns-dc8bbbcf9-4rsfl                 0/1     CrashLoopBackOff   18         55m     172.20.1.5   10.2.2.121   <none>           <none>
coredns-dc8bbbcf9-7rz2p                 0/1     CrashLoopBackOff   18         55m     172.20.2.4   10.2.2.122   <none>           <none>
kubernetes-dashboard-6685cb584f-nvc8p   0/1     CrashLoopBackOff   20         55m     172.20.2.5   10.2.2.122   <none>           <none>
metrics-server-79558444c6-gtt4t         0/1     CrashLoopBackOff   6          9m27s   172.20.1.6   10.2.2.121   <none>           <none>

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.5.1     0.0.0.0         UG    100    0        0 enp0s3
10.2.2.0        0.0.0.0         255.255.255.0   U     101    0        0 enp0s8
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.20.1.0      172.20.1.0      255.255.255.0   UG    0      0        0 flannel.1
172.20.2.0      172.20.2.0      255.255.255.0   UG    0      0        0 flannel.1
192.168.5.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3

# kubectl logs metrics-server-79558444c6-56qmd -n kube-system
panic: Get https://10.68.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.68.0.1:443: connect: connection refused

goroutine 1 [running]:
main.main()
	/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13b

# kubectl logs kubernetes-dashboard-6685cb584f-nvc8p -n kube-system
2019/03/11 13:19:06 Starting overwatch
2019/03/11 13:19:06 Using in-cluster config to connect to apiserver
2019/03/11 13:19:06 Using service account token for csrf signing
2019/03/11 13:19:06 No request provided. Skipping authorization
2019/03/11 13:19:06 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.68.0.1:443/version: dial tcp 10.68.0.1:443: getsockopt: connection refused
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

# kubectl logs coredns-dc8bbbcf9-7rz2p -n kube-system

E0311 13:46:20.106731       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.68.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.68.0.1:443: connect: connection refused

检查iptables 规则:

# iptables-save |grep KUBE-SEP-VPBSGNC2TAY6H4RC

-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
:KUBE-SEP-VPBSGNC2TAY6H4RC - [0:0]
-A KUBE-SEP-VPBSGNC2TAY6H4RC -s 192.168.5.120/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-VPBSGNC2TAY6H4RC -p tcp -m tcp -j DNAT --to-destination 192.168.5.120:6443
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-VPBSGNC2TAY6H4RC

这条规则 --to-destination 192.168.5.120:6443 为啥会是外网的ip?我猜是生成iptables规则错了,导致访问不到kube-apiserver 。

而kube-apiserver 绑定的私网ip :10.2.2.120:6443

# netstat -anp |grep LISTEN |grep 6443
tcp        0      0 10.2.2.120:6443         0.0.0.0:*               LISTEN      10996/kube-apiserve

再检查svc中的kubernetes的endpoint :

[[email protected] ansible]# kubectl get svc kubernetes
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1    <none>        443/TCP   19h
[[email protected] ansible]# kubectl get ep kubernetes
NAME         ENDPOINTS            AGE
kubernetes   192.168.5.120:6443   19h

从endpoint中可以看到,是endpoint的地址是错误,导致生成iptables规则也跟着错了。

这里比较奇怪,为什么endpoint的地址不取内网的IP地址呢?参照一些资料,原来kube-apiserver 启动过程中,会从/proc/net/route中检查系统的default gateway,如果系统没配置default gw ,启动就会失败。如果检查到default gw后,就会取该网段的网卡ip和默认端口(6443)分配给endpoint(kubernetes)。

而我的master主机设置外网(192.168.5.1)为default gateway,所以endpoint的地址为:192.168.5.120:6443

修正这个问题,只要把master的缺省网关设置为内网网关(10.2.2.1)就行,如下:

  公网IP 私网IP 网关
master 192.168.5.120 10.2.2.120 10.2.2.1
node1 192.168.5.121 10.2.2.121 192.168.5.1
node2 192.168.5.122 10.2.2.122 192.168.5.1

并重启kube-apiserver, 然后查看endpoint,已经正确。

[[email protected] ansible]# kubectl get ep kubernetes
NAME         ENDPOINTS         AGE
kubernetes   10.2.2.120:6443   19h

查看iptable规则,也正确了:

:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-26L3TDXW4RODGS5U
-A KUBE-SEP-26L3TDXW4RODGS5U -p tcp -m tcp -j DNAT --to-destination 10.2.2.120:6443

coredns,dashbord,metrics-server也可以访问kube-apiserver 了: 

 

# kubectl get po -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-dc8bbbcf9-4rsfl                 1/1     Running   165        2d
coredns-dc8bbbcf9-7rz2p                 1/1     Running   164        2d
kube-flannel-ds-amd64-pf4n9             1/1     Running   3          24h
kube-flannel-ds-amd64-r6l5q             1/1     Running   3          25h
kube-flannel-ds-amd64-ztgsm             1/1     Running   3          25h
kubernetes-dashboard-6685cb584f-8g8zh   1/1     Running   57         28h
metrics-server-79558444c6-l8qvh         1/1     Running   88         28h

其他问题:

1. 我们把master的缺省网关改为内网网关,这样导致了master不能访问公网,你只能想办法把10.2.2.1接上公网。

2. 如果flannel 出现问题,可以在flannel.service添加: -iface=enp0s8 参数,指定网卡,如果flannel是容器安装,你需要在yml文件中加上:

      - args:
        - --iface=enp0s8

参考:

https://github.com/kubernetes/kubernetes/issues/57534

https://github.com/gjmzj/kubeasz/issues/479

 

相关标签: kubernetes