欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

kubeadm 搭建k8s集群

程序员文章站 2022-07-13 21:10:11
...

0. 环境

组件:

  • kubernetes: v1.16.8
  • docker: 18.09.9
  • calico: v3.14.1

节点:

  • k8s01:10.13.84.186(master)
  • k8s02:10.13.84.187(node)
  • k8s03:10.13.84.188(node)

注意: 如无说明则表示命令在所有节点并以root权限执行

1. 准备工作

参考上面分别修改节点的主机名,比如

hostnamectl set-hostname k8s01

然后重新登录即可看到新主机名

为了访问方便,增加以下解析

cat >> /etc/hosts <<EOF
10.13.84.186 k8s01
10.13.84.187 k8s02
10.13.84.188 k8s03
EOF

增加dns避免节点访问某些域名出错
建议把下面的命令写到/etc/rc.local中,否则重启就没有了

cat >> /etc/resolv.conf <<EOF
nameserver 114.114.114.114
EOF

2. centos7升级内核

 载入公钥,不行的话先wget到本地再载入
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

# 安装ELRepo,不行的话先wget到本地再安装
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

# 载入elrepo-kernel元数据
yum --disablerepo=\* --enablerepo=elrepo-kernel repolist

# 查看可用的rpm包
yum --disablerepo=\* --enablerepo=elrepo-kernel list kernel*

# 安装长期支持版本的kernel
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt.x86_64

# 删除旧版本工具包
yum remove kernel-tools-libs.x86_64 kernel-tools.x86_64 -y

# 安装新版本工具包
yum --disablerepo=\* --enablerepo=elrepo-kernel install -y kernel-lt-tools.x86_64

# 遇到以下错误:
# 错误:软件包:kernel-lt-tools-4.4.218-1.el7.elrepo.x86_64 (elrepo-kernel)
#         需要:libpci.so.3(LIBPCI_3.5)(64bit)
# 错误:软件包:kernel-lt-tools-4.4.218-1.el7.elrepo.x86_64 (elrepo-kernel)
#         需要:libpci.so.3(LIBPCI_3.3)(64bit)

# 解决: https://centos.pkgs.org/7/centos-x86_64/pciutils-libs-3.5.1-3.el7.x86_64.rpm.html
# 执行下面的的命令,再重新执行上一条命令
yum install pciutils-libs

#查看默认启动顺序
awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg  

# 选择默认启动内核版本
grub2-set-default 1

#重启并检查
reboot

3. 内核调优

cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
EOF

modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf

4. 修改文件描述符

echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nproc 65536"  >> /etc/security/limits.conf
echo "* hard nproc 65536"  >> /etc/security/limits.conf
echo "* soft  memlock  unlimited"  >> /etc/security/limits.conf
echo "* hard memlock  unlimited"  >> /etc/security/limits.conf


hard limits自AIX 4.1版本开始引入。hard limits 应由AIX系统管理员设置,只有security组的成员可以将此值增大,用户本身可以减小此限定值,但是其更改将随着该用户从系统退出而失效

soft limits 是AIX核心使用的限制进程对系统资源的使用的上限值。此值可由任何人更改,但不能超出hard limits值。这里要注意的是只有security组的成员可使更改永久生效普通用户的更改在其退出系统后将失效

1)soft nofile和hard nofile示,单个用用户的软限制为1000,硬限制为1200,即表示单用户能打开的最大文件数量为1000,不管它开启多少个shell。

2)soft nproc和hard nproc 单个用户可用的最大进程数量,软限制和硬限制

3)memlock 一个任务锁住的物理内存的最大值(这里设置成无限制)

5. 关闭SELinux、防火墙、Swap

systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config

swapoff -a
yes | cp /etc/fstab /etc/fstab_bak
cat /etc/fstab_bak |grep -v swap > /etc/fstab

6. 时间同步

# 如果主机时间不同步,会导致区块同步出错
# 如果安装系统时已启用NTP,则跳过此步骤
# 查看时间,时区是否正确,是否已启用NTP
yum install -y ntp
ntpdate pool.ntp.org
systemctl enable ntpd.service
systemctl restart ntpd.service
systemctl status ntpd.service
ntpdate pool.ntp.org
timedatectl

7. 安装依赖、工具等

yum install -y epel-release
# 不行试试下面的
rpm -vih http://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/epel-release-7-12.noarch.rpm

yum clean all && yum makecache

yum install -y yum-utils device-mapper-persistent-data lvm2 net-tools conntrack-tools  libseccomp libtool-ltdl lrzsz 

8. 配置ipvs模块

cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

yum install ipset ipvsadm -y

9. 配置k8s yum 源

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

10. 安装docker

卸载旧版本

yum remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-selinux \
                  docker-engine-selinux \
                  docker-engine

安装新版本

yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 查看源
ll /etc/yum.repos.d/

yum makecache fast

# 看你想安装哪个版本,并不是越新越好
yum list docker-ce --showduplicates | sort -r 

yum install -y docker-ce-18.09.9-3.el7

cat > /etc/docker/daemon.json <<EOF
{
    "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"],
    "exec-opts": ["native.cgroupdriver=systemd"],
    "max-concurrent-downloads": 20,
    "live-restore": true,
    "max-concurrent-uploads": 10,
    "log-opts": {
      "max-size": "100m",
      "max-file": "5"
    }
}
EOF

systemctl daemon-reload
systemctl enable docker.service && systemctl start docker.service

11. 安装 kubelet kubeadm kubectl

export K8S_VERSION=1.16.8
yum install -y --disableexcludes=kubernetes kubelet-$K8S_VERSION kubeadm-$K8S_VERSION kubectl-$K8S_VERSION
systemctl enable kubelet.service
# 暂不启动 kubelet

12. 下载必须用到的镜像(国内环境)

kubeadm config print init-defaults > /root/kubeadm.conf

修改 /root/kubeadm.conf 其中几行

imageRepository: registry.aliyuncs.com/google_containers
kubernetesVersion: v1.16.8

下载镜像

kubeadm config images pull --config /root/kubeadm.conf

建议: 上面的命令都在所有节点执行完,再考虑下面的步骤

13. 初始化集群(master上执行)

注意版本

kubeadm init --kubernetes-version=v1.16.8 \
  --pod-network-cidr=10.244.0.0/16 

# 这里没有设置--service-cidr 是因为我们下面需要部署calico网络,calico会帮助我们设置service网络,如果此地设置了service网络会导致calico部署不成功

结果(找个地方保存起来,后面要用)

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.13.84.186:6443 --token 4q9g1x.j42gsbmfz1e9d1jv \
    --discovery-token-ca-cert-hash sha256:2d960f1d625e95087b295c322df6d5eb5e0d7f8b84cf986b75ba5a7fc09dae97 

创建相关文件

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

查看

kubectl get pods --all-namespaces

NAMESPACE     NAME                            READY   STATUS    RESTARTS   AGE
kube-system   coredns-5644d7b6d9-tv99t        0/1     Pending   0          4m31s
kube-system   coredns-5644d7b6d9-vfhpc        0/1     Pending   0          4m31s
kube-system   etcd-k8s01                      1/1     Running   0          3m44s
kube-system   kube-apiserver-k8s01            1/1     Running   0          3m26s
kube-system   kube-controller-manager-k8s01   1/1     Running   0          3m52s
kube-system   kube-proxy-rp67r                1/1     Running   0          4m31s
kube-system   kube-scheduler-k8s01            1/1     Running   0          3m43s

# coredns 是pending状态 先不用管它,因为这个没有网络插件的导致的

14. 部署calico网络插件(master上执行)

官方文档
https://docs.projectcalico.org/getting-started/kubernetes/quickstart

Configure NetworkManager

Configure NetworkManager before attempting to use Calico networking.

NetworkManager manipulates the routing table for interfaces in the default network namespace where Calico veth pairs are anchored for connections to containers. This can interfere with the Calico agent’s ability to route correctly.

每个节点上执行:
Create the following configuration file at /etc/NetworkManager/conf.d/calico.conf to prevent NetworkManager from interfering with the interfaces:

[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*
# 这里是v3.14.1
$ wget https://docs.projectcalico.org/manifests/calico.yaml

$ vim calico.yaml


1)修改ipip模式关闭 和typha_service_name

- name: CALICO_IPV4POOL_IPIP
value: "off"


typha_service_name: "calico-typha"


calico网络,默认是ipip模式(在每台node主机创建一个tunl0网口,这个隧道链接所有的node容器网络,官网推荐不同的ip网段适合,比如aws的不同区域主机),

修改成BGP模式,它会以daemonset方式安装在所有node主机,每台主机启动一个bird(BGP client),它会将calico网络内的所有node分配的ip段告知集群内的主机,并通过本机的网卡eth0或者ens33转发数据;

2)修改replicas

  replicas: 1

3)修改pod的网段CALICO_IPV4POOL_CIDR

- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"

4)如果手动下载镜像请查看calico.yaml 文件里面标注的镜像版本 否则可以直接执行会自动下载
$ cat calico.yaml |grep image

5)部署calico
$ kubectl apply -f calico.yaml

6)查看
$ kubectl get pods --all-namespaces

15. node节点加入集群(所有node节点上执行)

kubeadm join 10.13.84.186:6443 --token 4q9g1x.j42gsbmfz1e9d1jv \
    --discovery-token-ca-cert-hash sha256:2d960f1d625e95087b295c322df6d5eb5e0d7f8b84cf986b75ba5a7fc09dae97 

注意: 如果以后还有机器加入集群如何获取token 和 hash值,请看下面,否则跳过

# 1)获取token
$ kubeadm token list
# 默认情况下 Token 过期是时间是24小时,如果 Token 过期以后,可以输入以下命令,生成新的 Token
$ kubeadm token create

# 2)获取hash值
$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

耐心等一段时间

在master上查看集群信息

# 执行
$ kubectl get nodes

NAME    STATUS   ROLES    AGE     VERSION
k8s01   Ready    master   11m     v1.16.8
k8s02   Ready    <none>   3m29s   v1.16.8
k8s03   Ready    <none>   3m27s   v1.16.8

# 允许master放开调度
$ kubectl taint nodes --all node-role.kubernetes.io/master-

# 等到calico运行正常了,执行
$ ip route show

default via 10.13.84.1 dev ens32  proto static  metric 100 
10.13.84.0/23 dev ens32  proto kernel  scope link  src 10.13.84.186  metric 100 
10.244.1.0/26 via 10.13.84.187 dev ens32  proto bird 
10.244.2.0/26 via 10.13.84.188 dev ens32  proto bird 
10.244.73.64 dev cali2fd97f91c35  scope link 
blackhole 10.244.73.64/26  proto bird 
10.244.73.65 dev cali316268cf0c4  scope link 
10.244.73.66 dev cali4a5794afb0a  scope link 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 


$ curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.14.1/calicoctl
$ chmod +x ./calicoctl
#export CALICO_DATASTORE_TYPE=kubernetes
#export CALICO_KUBECONFIG=~/.kube/config
$ ./calicoctl node status


Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.13.84.188 | node-to-node mesh | up    | 03:09:22 | Established |
| 10.13.84.187 | node-to-node mesh | up    | 03:09:35 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

16. 开启kube-proxy ipvs模式

# 修改ConfigMap的kube-system/kube-proxy中的config.conf,`mode: "ipvs"`:
$ kubectl edit cm kube-proxy -n kube-system

# 等10s,删除pod让它自动重建
$ kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'

$ ipvsadm -L -n

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 10.13.84.186:6443            Masq    1      0          0         
TCP  10.96.0.10:53 rr
  -> 10.244.73.65:53              Masq    1      0          0         
  -> 10.244.73.66:53              Masq    1      0          0         
TCP  10.96.0.10:9153 rr
  -> 10.244.73.65:9153            Masq    1      0          0         
  -> 10.244.73.66:9153            Masq    1      0          0         
UDP  10.96.0.10:53 rr
  -> 10.244.73.65:53              Masq    1      0          0         
  -> 10.244.73.66:53              Masq    1      0          0         

17.安装 Nodelocal DNS

nodelocaldns.yaml

# Copyright 2018 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v1
kind: ServiceAccount
metadata:
  name: node-local-dns
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
#apiVersion: v1
#kind: Service
#metadata:
#  name: kube-dns-upstream
#  namespace: kube-system
#  labels:
#    k8s-app: kube-dns
#    kubernetes.io/cluster-service: "true"
#    addonmanager.kubernetes.io/mode: Reconcile
#    kubernetes.io/name: "KubeDNSUpstream"
#spec:
#  ports:
#  - name: dns
#    port: 53
#    protocol: UDP
#    targetPort: 53
#  - name: dns-tcp
#    port: 53
#    protocol: TCP
#    targetPort: 53
#  selector:
#    k8s-app: kube-dns
#---
apiVersion: v1
kind: ConfigMap
metadata:
  name: node-local-dns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
data:
  Corefile: |
    cluster.local:53 {
        errors
        cache {
                success 9984 30
                denial 9984 5
        }
        reload
        loop
        bind 169.254.20.10
        forward . 10.96.0.10 {
                force_tcp
        }
        prometheus :9253
        health 169.254.20.10:8080
        }
    in-addr.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10
        forward . 10.96.0.10 {
                force_tcp
        }
        prometheus :9253
        }
    ip6.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10
        forward . 10.96.0.10 {
                force_tcp
        }
        prometheus :9253
        }
    .:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10
        forward . /etc/resolv.conf
        prometheus :9253
        }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
  namespace: kube-system
  labels:
    k8s-app: node-local-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 10%
  selector:
    matchLabels:
      k8s-app: node-local-dns
  template:
    metadata:
      labels:
        k8s-app: node-local-dns
      annotations:
        prometheus.io/port: "9253"
        prometheus.io/scrape: "true"
    spec:
      priorityClassName: system-node-critical
      serviceAccountName: node-local-dns
      hostNetwork: true
      dnsPolicy: Default  # Don't use cluster DNS.
      tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      - effect: "NoExecute"
        operator: "Exists"
      - effect: "NoSchedule"
        operator: "Exists"
      containers:
      - name: node-cache
        image: k8s.gcr.io/k8s-dns-node-cache:1.15.13
        resources:
          requests:
            cpu: 25m
            memory: 5Mi
        args: [ "-localip", "169.254.20.10", "-conf", "/etc/Corefile", "-upstreamsvc", "kube-dns" ]
        securityContext:
          privileged: true
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9253
          name: metrics
          protocol: TCP
        livenessProbe:
          httpGet:
            host: 169.254.20.10
            path: /health
            port: 8080
          initialDelaySeconds: 60
          timeoutSeconds: 5
        volumeMounts:
        - mountPath: /run/xtables.lock
          name: xtables-lock
          readOnly: false
        - name: config-volume
          mountPath: /etc/coredns
        - name: kube-dns-config
          mountPath: /etc/kube-dns
      volumes:
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
      - name: kube-dns-config
        configMap:
          name: kube-dns
          optional: true
      - name: config-volume
        configMap:
          name: node-local-dns
          items:
            - key: Corefile
              path: Corefile.base
$ kubectl apply -f nodelocaldns.yam

$ kubectl get pods -n kube-system

运行正常后,修改每个节点的/var/lib/kubelet/config.yaml:

clusterDNS:
- 169.254.20.10

然后在每个节点:

systemctl daemon-reload && systemctl restart kubelet.service

18.测试

$ kubectl run cirros-$RANDOM --rm -it --image=cirros -- sh
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.

/ # cat /etc/resolv.conf 
nameserver 169.254.20.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
/ # 

清理

需要清理的话在所有节点上执行

kubeadm reset
ipvsadm --clear
iptables -F 

问题

遇到最多的问题就是节点的网络问题

参考

相关标签: k8s