Kubernetes 无法删除pod实例的排查过程

程序员文章站 2022-03-01 17:24:56

...

以下内容来自博客：https://blog.csdn.net/qq_19674905/article/details/80887461

原本由两台Kubernetes组成的小集群，但是今天只开启了一台机器，也就是只有一个节点，造成了无法删除pod实例的原因。

先查看一下现在的容器的运行状态：

[[email protected] ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-controller-lv8md 1/1 Unknown 0 16h
nginx-controller-sb3fx 1/1 Unknown 2 16h
nginx2-1216651254-4b2dw 0/1 ImagePullBackOff 0 8m
nginx2-1216651254-dbtms 0/1 ImagePullBackOff 0 8m
nginx2-1216651254-fhb4r 0/1 ImagePullBackOff 0 8m

查看有哪些replicationcontroller [简写rc]

[[email protected] ~]# kubectl get rc
No resources found.

查看有哪些services

[[email protected] ~]# kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.254.0.1 <none> 443/TCP 2d

看到上面没有rc，也没有services，那尝试这样删除所有的pods：

[[email protected] ~]# kubectl delete pods --all
pod "nginx-controller-lv8md" deleted
pod "nginx-controller-sb3fx" deleted
pod "nginx2-1216651254-4b2dw" deleted
pod "nginx2-1216651254-dbtms" deleted
pod "nginx2-1216651254-fhb4r" deleted

但是还是无法删除，查看已经部署的容器；

[[email protected] ~]# kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx2 3 3 3 0 16h
[[email protected] ~]# kubectl delete deployment nginx2
deployment "nginx2" deleted

为什么这三个Pod实例没有rc或者services呢，因为创建它的时候是使用run来实现的；

但是剩下的两个实例怎么删除呢？

[[email protected] ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-controller-lv8md 1/1 Unknown 0 20h
nginx-controller-sb3fx 1/1 Unknown 2 20h

查看一下实例的信息：

[[email protected] ~]# kubectl describe pod nginx-controller-lv8md
Name: nginx-controller-lv8md
Namespace: default
Node: k8s-node/10.0.10.11 #这里是重点，因为这两个容器是分配到了k8s-node节点上，而这个节点现在宕机了。
Start Time: Tue, 13 Jun 2017 02:01:45 +0800
Labels: app=nginx
Status: Terminating (expires Mon, 12 Jun 2017 21:46:21 +0800)
Termination Grace Period: 30s
Reason: NodeLost
Message: Node k8s-node which was running pod nginx-controller-lv8md is unresponsive
IP: 172.21.42.3
Controllers: ReplicationController/nginx-controller
Containers:
nginx:
Container ID: docker://03fa59f9efc06e43ed8c9acc7d4c7533983d5733223dbb2efa5f65928d965b5b
Image: reg.docker.lc/share/nginx:latest
Image ID: docker-pullable://reg.docker.lc/share/[email protected]:e5c82328a509aeb7c18c1d7fb36633dc638fcf433f651bdcda59c1cc04d3ee55
Port: 80/TCP
State: Running
Started: Tue, 13 Jun 2017 02:01:47 +0800
Ready: True
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
No events.

因为这两个容器的rc，services都已经删除了，但是还保持这个Unknown状态是由于目标主机无法响应并返回信息导致；既然目标主机都已经宕机了，那就直接移除节点；

[[email protected] ~]# kubectl delete node k8s-node
node "k8s-node" deleted
[[email protected] ~]# kubectl get node
NAME STATUS AGE
k8s Ready 2d

因为节点都不存在了，那就没有了容器的状态信息；

[[email protected] ~]# kubectl get pods
No resources found.

上面这样虽然是删除了那些已经也错的实例，但却删除了节点。那么如果节点现在恢复了，是否会自动加入到集群呢？

主节点上的日志很快就发现了k8s-node节点了；

Jun 13 15:06:13 k8s kube-controller-manager[34050]: E0613 15:06:13.917313 34050 actual_state_of_world.go:475] Failed to set statusUpdateNeeded to needed true because nodeName="k8s-node" does not exist
Jun 13 15:06:14 k8s kube-controller-manager[34050]: I0613 15:06:14.618864 34050 event.go:217] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"k8s-node", UID:"c9864434-5006-11e7-ab16-000c29e9277a", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node k8s-node event: Registered Node k8s-node in NodeController

节点自动又回来了，因为在配置文件中已经配置好了，只要节点一上线就会加入到集群中。

[[email protected] ~]# kubectl get nodes
NAME STATUS AGE
k8s Ready 2d
k8s-node Ready 1m

创建一个Nginx实例：

[[email protected] ~]# kubectl create -f Nginx.yaml
replicationcontroller "nginx-controller" created
service "nginx-service" created

Kubernetes已经把此实例分配到刚启动的k8s-node节点上了；

[[email protected] ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-controller-zsx2q 1/1 Running 0 6s
[[email protected] ~]# kubectl describe pod nginx-controller-zsx2q |grep "Node"
Node: k8s-node/10.0.10.11

动态扩展，添加一个容器数量；

[[email protected] ~]# kubectl scale replicationcontroller --replicas=2 nginx-controller
replicationcontroller "nginx-controller" scaled

查看集群现在的状态信息；

[[email protected] ~]# kubectl describe svc nginx-service
Name: nginx-service
Namespace: default
Labels: <none>
Selector: app=nginx
Type: ClusterIP
IP: 10.254.132.82
External IPs: 10.0.10.10
Port: <unset> 8000/TCP
Endpoints: 172.21.42.2:80,172.21.93.2:80
Session Affinity: None
No events.

此时再删除k8s-node节点又会如何呢？

[[email protected] ~]# kubectl delete node k8s-node
node "k8s-node" deleted

已经删除了一个节点了：

[[email protected] ~]# kubectl get nodes
NAME STATUS AGE
k8s Ready 2d

Pod的数量没有减少还是两个：

[[email protected] ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-controller-43qpx 1/1 Running 0 14m
nginx-controller-zsx2q 1/1 Running 0 19m

查看集群中的分配信息时发现，两个容器的Ip段都是172.21.93的说明来自同一节点；

[[email protected] ~]# kubectl describe svc nginx-service
Name: nginx-service
Namespace: default
Labels: <none>
Selector: app=nginx
Type: ClusterIP
IP: 10.254.132.82
External IPs: 10.0.10.10
Port: <unset> 8000/TCP
Endpoints: 172.21.93.2:80,172.21.93.3:80
Session Affinity: None
No events.

重新加入k8s-node 节点，需要在k8s-node上重启一下服务：

systemctl restart kubelet

--------------------- 本文来自 yao不ke及的CSDN 博客，全文地址请点击：https://blog.csdn.net/qq_19674905/article/details/80887461?utm_source=copy

Kubernetes 无法删除pod实例的排查过程

kubernetes删除namespace进入Terminating状态的排查过程

Kubernetes 无法删除pod实例的排查过程

Kubernetes 无法删除pod实例的排查过程