docker+rancher搭建过程中的报错排除笔记
docker+rancher搭建k8s 报错笔记
CentOS 7
docker v1.20.x
rancher v2.3.5
内网环境 无法出外网
ps:采用Nexus3作为docker镜像仓库代理,Nexus3的代理的相关安装配置见这里
ETCD无法创建问题
没有外网,经常出现docker 镜像无法拉取的情况,rancher正常启动后,登录到webui界面,开始创建k8s集群,发现抱错,etcd无法创建如下
查看了一下rancher容器运行log,日志如下
2022/02/14 13:34:14 [WARNING] Failed to create Docker container [etcd] on host [192.168.1.1]: Error response from daemon: No such image: rancher/coreos-etcd:v3.4.3-rancher1
2022/02/14 13:34:14 [ERROR] cluster [c-rc4nk] provisioning: [etcd] Failed to bring up Etcd Plane: Failed to create [etcd] container on host [192.168.1.1]: Failed to create Docker container [etcd] on host [192.168.1.1]: <nil>
2022/02/14 13:34:14 [INFO] kontainerdriver rancherkubernetesengine stopped
2022/02/14 13:34:14 [ERROR] ClusterController c-rc4nk [cluster-provisioner-controller] failed with : [etcd] Failed to bring up Etcd Plane: Failed to create [etcd] container on host [192.168.1.1]: Failed to create Docker container [etcd] on host [192.168.1.1]: <nil>
2022/02/14 13:37:44 [ERROR] Error parsing max age Error parsing auth refresh max age: time: invalid duration s
rancher/coreos-etcd:v3.4.3-rancher1
docker镜像无法拉取,没法后面从其他地方pull取该镜像,再推到Nexus中
[etcd] Failed to bring up Etcd Plane
ETCD启动失败问题,这个是个经典的问题,网上很多教程,就是得重新删除干净,重启docker服务
[etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.154.231] failed to report healthy. Check etcd container logs on each host for more information
清除的指令如下
docker stop $(docker ps -aq)
# 注意,这个会把所用容器删除
docker system prune -f
# 注意,这个会清空所有volume
docker volume rm $(docker volume ls -q)
# 注意,这个会清空所有image
docker image rm $(docker image ls -q)
rm -rf /etc/ceph \
/etc/cni \
/etc/kubernetes \
/opt/cni \
/opt/rke \
/run/secrets/kubernetes.io \
/run/calico \
/run/flannel \
/var/lib/calico \
/var/lib/etcd \
/var/lib/cni \
/var/lib/kubelet \
/var/lib/rancher/rke/log \
/var/log/containers \
/var/log/pods \
反复重启后,终于以为快到胜利了,谁知还有王炸
Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
这个尼玛到处找解决方案,找了好久,也是各种尝试都失败了
后面找到个比较靠谱的
查看了一下kubelet容器日志
docker logs kubelet
E0215 14:34:53.376690 25851 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
W0215 14:34:54.249077 25851 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
E0215 14:34:55.238592 25851 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
I0215 14:34:55.238747 25851 provider.go:98] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
I0215 14:34:55.788537 25851 kube_docker_client.go:345] Stop pulling image "rancher/pause:3.1": "7675586df687: Downloading "
E0215 14:34:55.788569 25851 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused
E0215 14:34:55.788592 25851 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused
E0215 14:34:55.788601 25851 kuberuntime_manager.go:729] createPodSandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused
E0215 14:34:55.788624 25851 pod_workers.go:191] Error syncing pod 36f21b3d-16b8-4ca6-9b62-8f96be849d6c ("rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)"), skipping: failed to "CreatePodSandbox" for "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)\" failed: rpc error: code = Unknown desc = failed pulling image \"rancher/pause:3.1\": Get \"https://registry-1.docker.io/v2/\": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:52359->[::1]:53: read: connection refused"
E0215 14:34:58.384759 25851 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
W0215 14:34:58.917882 25851 container.go:412] Failed to create summary reader for "/docker/24b652688425c7ce066f2b48c152e3517cefc5b6e6ddd6c678d0f3690ce85343": none of the resources are being tracked.
W0215 14:34:59.249222 25851 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
W0215 14:35:01.876641 25851 container.go:412] Failed to create summary reader for "/docker/6c119f3554c2826383b27ada71887a6a7b91549440a2fc908a991fb6e99cca83": none of the resources are being tracked.
E0215 14:35:03.389468 25851 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
W0215 14:35:03.878133 25851 container.go:412] Failed to create summary reader for "/docker/f1caf907cd832aa331d4421c745891abfd0ab0b4aa250be963797ddb49ede164": none of the resources are being tracked.
W0215 14:35:04.249318 25851 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
W0215 14:35:06.514104 25851 container.go:412] Failed to create summary reader for "/docker/0a2415cd7536b2c252698e4c59ff5d2fb0b8f5888d11c4ccfb189082c850178e": none of the resources are being tracked.
E0215 14:35:08.395996 25851 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I0215 14:35:09.073937 25851 container_manager_linux.go:469] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
W0215 14:35:09.213734 25851 container.go:412] Failed to create summary reader for "/docker/7b9a4127f215057a403ff4747f5ee82e99bf58fe3db4da468573f987087cc73c": none of the resources are being tracked.
W0215 14:35:09.249559 25851 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
I0215 14:35:11.008180 25851 kuberuntime_manager.go:424] No sandbox for pod "rke-network-plugin-deploy-job-lpm7j_kube-system(36f21b3d-16b8-4ca6-9b62-8f96be849d6c)" can be found. Need to start a new one
W0215 14:35:11.676849 25851 container.go:412] Failed to create summary reader for "/docker/49aa66c13dd22d740981bfd6cea750b674c5bcf6dc2b1486095723bf661439c0": none of the resources are being tracked.
E0215 14:35:13.401485 25851 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
W0215 14:35:14.249673 25851 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
W0215 14:35:14.609182 25851 container.go:412] Failed to create summary reader for "/docker/0c05b25ff9b0f72ccbd8be4dd7767aac3c56cb279521d91a3ac4f703473227aa": none of the resources are being tracked.
认为关键错误是
Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
尝试了下面的操作
添加配置文件
mkdir /etc/cni/net.d
vi /etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"cniVersion": "0.3.0", #这个版本内是去rancher的官方docker仓库中查到的flannel-cni最新版本
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
重启后,rancher-web-ui上还是报错Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
但是,重新查看docker logs kubelet
有新发现
E0215 15:15:52.939226 25851 pod_workers.go:191] Error syncing pod ea4ac90a-352a-4edc-8a4b-347f5ce5405c ("rke-network-plugin-deploy-job-jdt6t_kube-system(ea4ac90a-352a-4edc-8a4b-347f5ce5405c)"), skipping: failed to "CreatePodSandbox" for "rke-network-plugin-deploy-job-jdt6t_kube-system(ea4ac90a-352a-4edc-8a4b-347f5ce5405c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"rke-network-plugin-deploy-job-jdt6t_kube-system(ea4ac90a-352a-4edc-8a4b-347f5ce5405c)\" failed: rpc error: code = Unknown desc = failed pulling image \"rancher/pause:3.1\": Get \"https://registry-1.docker.io/v2/\": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:46815->[::1]:53: read: connection refused"
W0215 15:15:54.359391 25851 cni.go:202] Error validating CNI config list {
"name": "cbr0",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
: [failed to find plugin "flannel" in path [/opt/cni/bin] failed to find plugin "portmap" in path [/opt/cni/bin]]
W0215 15:15:54.359428 25851 cni.go:237] Unable to update cni config: no valid networks found in /etc/cni/net.d
有可能是前执行清空docker指令导致的 /opt/cni/bin目录下没有任何程序,没有重新创建,然后复制了其它同镜像的容器里/opt/cni/bin下面的文件到宿主机/opt/cni/bin目录下
再次重新启动,还是ui上还是报错,查看了docker logs kubetel
又有新发现,最后锁定下面的错误
E0215 15:47:53.338808 6762 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed pulling image "rancher/pause:3.1": Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:33648->[::1]:53: read: connection refused
rancher/pause:3.1
无法拉取,解决这个docker镜像无法拉取的问题后,终于正常跑起来来了
附上愉快的图
注意保证服务器环境的整洁,以前残留的数据会影响集群的
- kubelet容器会挂载
/etc/cni
/opt/cni
目录的 - etcd会挂载
/var/lib/etcd
目录
最后献上几篇参考
https://www.jianshu.com/p/b6a5a233f117
https://www.bookstack.cn/read/rancher-v2.x/db0dcb78c29a3817.md
https://kubernetes.io/docs/tasks/tools/
https://docs.rancher.cn/
上一篇: pytest-fixture