欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

grpc: the connection is unavailable

程序员文章站 2022-03-12 12:26:49
...

错误记录

pod 无法正常创建

[[email protected] ~]# kubectl get po --all-namespaces -owide |grep -v "Running"
liangmingb                test-1858107638-nrw6d                           0/1       ContainerCreating   0          1h        <none>            slave-143


[[email protected] ~]# kubectl describe po test-1858107638-nrw6d -nliangmingb
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  1h        5s      1762    kubelet, slave-143          Warning     FailedSync  Error syncing pod
  1h        4s      1762    kubelet, slave-143          Normal      SandboxChanged  Pod sandbox changed, it will be killed and re-created

查看kubelet日志 有报错如下:

12月 11 20:07:09 slave-143 kubelet[23490]: WARNING:1211 20:07:09.376601   23490 cni.go:258] CNI failed to retrieve network namespace path: Error: No such container: 9adaef08d85b827e78600cc0a170df27617481442463962f3f30495ce878cc1f
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR:1211 20:07:09.622003   23490 docker_sandbox.go:239] Failed to stop sandbox "9adaef08d85b827e78600cc0a170df27617481442463962f3f30495ce878cc1f": Error response from daemon: {"message":"No such container: 9adaef08d85b827e78600cc0a170df27617481442463962f3f30495ce878cc1f"}
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR
12月 11 20:07:09 slave-143 kubelet[23490]: ERROR:1211 20:07:09.878784   23490 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to start sandbox container for pod "test-1858107638-nrw6d": Error response from daemon: {"message":"grpc: the connection is unavailable"}

这种报错产生的影响

# 如果kubelet有这种报错 会有如下影响
1. node所在节点现有的pod 无法停止 也无法删除
  注: 但不影响已有pod的使用 从其他节点可以ping通已有pod的ip地址
2. node上新分配过来的pod 也无法正常创建

[[email protected] ~]# docker ps |grep filebeat
07646329caa3        reg.enncloud.cn/enncloud/[email protected]:8869c3fcd0eadfe6202407b06eec8e672f37de3d093031bc01c03e5736e842d9                       "./run.sh"               34 hours ago        Up 34 hours                             k8s_filebeat_filebeat-hvrz9_kube-system_79e95933-fc20-11e8-885a-5254c2cdf2fd_0
9ab9c478ea3d        reg.enncloud.cn/enncloud/pause-amd64:3.0                                                                                        "/pause"                 34 hours ago        Up 34 hours                             k8s_POD_filebeat-hvrz9_kube-system_79e95933-fc20-11e8-885a-5254c2cdf2fd_0

[[email protected] ~]# docker stop 07646329caa3
Error response from daemon: Cannot stop container 07646329caa3: Cannot kill container 07646329caa3e77b64f76707df5f69242f753c678a9c344dc83ff25cdd0cdb2f: rpc error: code = 14 desc = grpc: the connection is unavailable

[[email protected] ~]# docker rm -f 07646329caa3
Error response from daemon: Could not kill running container 07646329caa3e77b64f76707df5f69242f753c678a9c344dc83ff25cdd0cdb2f, cannot remove - Cannot kill container 07646329caa3e77b64f76707df5f69242f753c678a9c344dc83ff25cdd0cdb2f: rpc error: code = 14 desc = grpc: the connection is unavailable


# 换言之 kubectl drain  这种驱逐pod的方式 也无法生效 因为pod 根本无法删除

解决办法

systemctl restart docker

github 上的相关issue

https://www.infoq.cn/article/2017%2F02%2FDocker-Containerd-RunC