欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

k8s部署时coredns出现CrashLoopBackOff的错误的解决方案

程序员文章站 2022-03-12 12:43:43
...

k8s部署时coredns出现CrashLoopBackOff的错误的解决方案

问题描述

之前做项目时要用k8s去搭一个集群。本人纯新手小白,就按照网上的搭建步骤一步步往下做(部署的过程参考 链接网址
在查看集群内各pod状态时,发现了coredns没有启动成功,一直处于CrashLoopBackOff状态,陷入了不停错误重启的死循环中

[[email protected] a1zMC2]# kubectl get pods -n kube-system
NAME                                 READY   STATUS             RESTARTS   AGE
coredns-bccdc95cf-9wd9n              0/1     CrashLoopBackOff   19         19h
coredns-bccdc95cf-qsf9f              0/1     CrashLoopBackOff   19         19h
etcd-k8s-master                      1/1     Running            3          19h
kube-apiserver-k8s-master            1/1     Running            3          19h
kube-controller-manager-k8s-master   1/1     Running            11         19h
kube-flannel-ds-amd64-sgqsm          1/1     Running            1          16h
kube-flannel-ds-amd64-swqhf          1/1     Running            1          16h
kube-flannel-ds-amd64-tnbmc          1/1     Running            1          16h
kube-proxy-259l8                     1/1     Running            0          16h
kube-proxy-qcnpt                     1/1     Running            0          16h
kube-proxy-rp7qx                     1/1     Running            3          19h
kube-scheduler-k8s-master            1/1     Running            11         19h

解决思路

查看coredns的日志文件,内容如下

[[email protected] a1zMC2]# kubectl logs -f coredns-bccdc95cf-9wd9n -n kube-system
E0512 01:59:03.825489       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0512 01:59:03.825489       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-bccdc95cf-9wd9n.unknownuser.log.ERROR.20210512-015903.1: no such file or directory

再通过kubectl describe pod coredns-bccdc95cf-9wd9n -n kube-system命令查看详情

Events:
  Type     Reason            Age                  From                 Message
  ----     ------            ----                 ----                 -------
  Warning  FailedScheduling  16h (x697 over 17h)  default-scheduler    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Warning  Unhealthy         15h (x5 over 15h)    kubelet, k8s-master  Readiness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy         15h (x5 over 15h)    kubelet, k8s-master  Liveness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

感觉应该和host连接出了问题,于是输入cat /etc/resolv.conf查看配置文件。发现里面nameserver那一栏并不是主机master的地址。

抱着试一试的态度,修改它为master节点的IP地址后,再重启docker和kubenet

[[email protected] a1zMC2]# systemctl stop kubelet
[[email protected] a1zMC2]# systemctl stop docker
[[email protected] a1zMC2]# iptables --flush
[[email protected] a1zMC2]# iptables -tnat --flush
[[email protected] a1zMC2]# systemctl start kubelet
[[email protected] a1zMC2]# systemctl start docker

查看一下状态,发现所有的pod可以正常工作了!

[[email protected] a1zMC2]# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-bccdc95cf-9wd9n              1/1     Running   21         20h
coredns-bccdc95cf-qsf9f              1/1     Running   21         20h
etcd-k8s-master                      1/1     Running   4          19h
kube-apiserver-k8s-master            1/1     Running   4          19h
kube-controller-manager-k8s-master   1/1     Running   12         19h
kube-flannel-ds-amd64-sgqsm          1/1     Running   1          17h
kube-flannel-ds-amd64-swqhf          1/1     Running   1          17h
kube-flannel-ds-amd64-tnbmc          1/1     Running   2          17h
kube-proxy-259l8                     1/1     Running   0          17h
kube-proxy-qcnpt                     1/1     Running   0          17h
kube-proxy-rp7qx                     1/1     Running   4          20h
kube-scheduler-k8s-master            1/1     Running   12         19h

因为没学过云计算的内容,所以博客中有错误的,欢迎在评论区指正。