kubenetes—kubeflow安装部署
安装教程:
http://www.rhce.cc/2182.html
https://blog.csdn.net/wo18237095579/article/details/86630750
https://www.cnblogs.com/zhongle21/p/12220789.html#_lab2_0_2
一、所有节点
1、在所有节点上修改hostname
hostnamectl set-hostname master
hostnamectl set-hostname node
2、在所有节点上修改ip
vi /etc/sysconfig/network-scripts/ifcfg-ens33
注意事项:(1)BOOTPROTO=static
(2)ONBOOT=yes
(3)IPADDR=10.4.7.23
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=cf12d995-ca97-4e6e-9f08-bdc547ee9478
DEVICE=ens33
ONBOOT=yes
IPADDR=10.4.7.23
NETMASK=255.255.255.0
GATEWAY=10.4.7.1
DNS1=10.4.7.11
DNS2=8.8.8.8
ZONE=public
service network restart
3、在所有节点上同步/etc/hosts
vi /etc/hosts
/etc/init.d/network restart
注意事项:(1)151.101.108.133 raw.githubusercontent.com,这一行是防止配置flannel时连接不上raw.githubusercontent.com
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.133.99 Centos dbserver
10.4.7.23 kubeadm1
10.4.7.24 kubeadm2
151.101.108.133 raw.githubusercontent.com
4、在所有节点上配置防火墙和关闭selinux
systemctl disable firewalld.service
systemctl stop firewalld.service
setenforce 0
vim /etc/selinux/config
SELINUX=disabled
5、在所有节点上关闭swap
swapoff -a
6、在所有节点上配置好yum源
vi /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes Repo
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
7、在所有节点安装并启动docker,并设置docker自动启动
yum install docker -y
systemctl enable docker --now
8、在所有节点设置相关属性
vi /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
9、在所有节点上安装软件包
yum install -y kubelet-1.18.2-0 kubeadm-1.18.2-0 kubectl-1.18.2-0 --disableexcludes=kubernetes
10、在所有节点上启动kubelet,并设置开机自动启动
systemctl restart kubelet
systemctl enable kubelet
echo “export KUBECONFIG=/etc/kubernetes/admin.conf” >> /etc/profile
source /etc/profile
echo $KUBECONFIG
scp /etc/kubernetes/admin.conf [email protected]:/etc/kubernetes/
11、修改阿里云的源
vi /etc/docker/daemon.json
{
“registry-mirrors”: [“https://registry.docker-cn.com”]
}
systemctl daemon-reload
systemctl restart docker
二、安装master
https://blog.csdn.net/chenxun_2010/article/details/107109311/
1、初始化集群
kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.18.2 --pod-network-cidr=10.244.0.0/16
2、添加配置
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown
(
i
d
−
u
)
:
(id -u):
(id−u):(id -g) $HOME/.kube/config
3、添加flannel
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml
4、等待5分钟左右验证节点
kubectl get pods --all-namespaces -o wide
kubectl get nodes
三、安装node
1、主节点上获取token
kubeadm token create --print-join-command
2、将从节点添加至集群
kubeadm join 10.4.7.23:6443 --token 133ugt.f9xaifwfbi9963th --discovery-token-ca-cert-hash sha256:14bc9da5c0e879882120a9a8332b12dba546b450ceee6805263fe2a8ee7567b6
注意事项:
(1)报错Config not found: /etc/kubernetes/admin.conf
把master节点的对应文件复制到node节点的相应位置,
请参照https://www.cnblogs.com/wind-zhou/p/12829079.html
(2)报错:/etc/kubernetes/kubelet.conf already exists
/etc/kubernetes/pki/ca.crt already exists
请参照https://www.cnblogs.com/liuyi778/p/12229416.html
或者加上 --ignore-preflight-errors=all
(3)报错:[ERROR Port-10250]: Port 10250 is in use
systemctl restart kubelet
还有一种可能是在master节点上部署node节点,该端口已经占用,请自行百度解决
3、部署nginx
https://blog.csdn.net/qq_43279371/article/details/107768713
四、部署kubeflow
docker pull istio/sidecar_injector:1.1.6
docker pull istio/proxyv2:1.1.6
docker pull istio/proxy_init:1.1.6
docker pull istio/pilot:1.1.6
docker pull istio/mixer:1.1.6
docker pull istio/galley:1.1.6
docker pull istio/citadel:1.1.6
docker pull sw1136562366/viewer-crd-controller:0.2.5
docker pull sw1136562366/api-server:0.2.5
docker pull sw1136562366/frontend:0.2.5
docker pull sw1136562366/visualization-server:0.2.5
docker pull sw1136562366/scheduledworkflow:0.2.5
docker pull sw1136562366/persistenceagent:0.2.5
docker pull sw1136562366/envoy:metadata-grpc
docker pull sw1136562366/profile-controller:v1.0.0-ge50a8531
docker pull sw1136562366/notebook-controller:v1.0.0-gcd65ce25
docker pull sw1136562366/katib-ui:v0.8.0
docker pull sw1136562366/katib-controller:v0.8.0
docker pull sw1136562366/katib-db-manager:v0.8.0
docker pull sw1136562366/jupyter-web-app:v1.0.0-g2bd63238
docker pull sw1136562366/centraldashboard:v1.0.0-g3ec0de71
docker pull sw1136562366/tf_operator:v1.0.0-g92389064
docker pull sw1136562366/pytorch-operator:v1.0.0-g047cf0f
docker pull sw1136562366/kfam:v1.0.0-gf3e09203
docker pull sw1136562366/admission-webhook:v1.0.0-gaf96e4e3
docker pull sw1136562366/metadata:v0.1.11
docker pull sw1136562366/metadata-frontend:v0.1.8
docker pull sw1136562366/application:1.0-beta
docker pull sw1136562366/ingress-setup:latest
docker pull sw1136562366/activator:latest
docker pull sw1136562366/webhook:latest
docker pull sw1136562366/controller:latest
docker pull sw1136562366/istio:latest
docker pull sw1136562366/autoscaler-hpa:latest
docker pull sw1136562366/autoscaler:latest
docker pull sw1136562366/kfserving-controller:0.2.2
docker pull sw1136562366/ml_metadata_store_server:v0.21.1
docker pull sw1136562366/spark-operator:v1beta2-1.0.0-2.4.4
docker pull sw1136562366/kube-rbac-proxy:v0.4.0
docker pull sw1136562366/spartakus-amd64:v1.1.0
docker pull argoproj/workflow-controller:v2.3.0
docker pull argoproj/argoui:v2.3.0
docker tag docker.io/sw1136562366/ingress-setup:latest k8s.gcr.io/kubeflow-images-public/ingress-setup:latest
docker tag docker.io/sw1136562366/ingress-setup:latest gcr.io/kubeflow-images-public/ingress-setup:latest
# tag ml-pipeline images
docker tag docker.io/sw1136562366/viewer-crd-controller:0.2.5 k8s.gcr.io/ml-pipeline/viewer-crd-controller:0.2.5
docker tag docker.io/sw1136562366/api-server:0.2.5 k8s.gcr.io/ml-pipeline/api-server:0.2.5
docker tag docker.io/sw1136562366/frontend:0.2.5 k8s.gcr.io/ml-pipeline/frontend:0.2.5
docker tag docker.io/sw1136562366/visualization-server:0.2.5 k8s.gcr.io/ml-pipeline/visualization-server:0.2.5
docker tag docker.io/sw1136562366/scheduledworkflow:0.2.5 k8s.gcr.io/ml-pipeline/scheduledworkflow:0.2.5
docker tag docker.io/sw1136562366/persistenceagent:0.2.5 k8s.gcr.io/ml-pipeline/persistenceagent:0.2.5
docker tag docker.io/sw1136562366/envoy:metadata-grpc k8s.gcr.io/ml-pipeline/envoy:metadata-grpc
# tag kubeflow-images-public images
docker tag docker.io/sw1136562366/profile-controller:v1.0.0-ge50a8531 k8s.gcr.io/kubeflow-images-public/profile-controller:v1.0.0-ge50a8531
docker tag docker.io/sw1136562366/notebook-controller:v1.0.0-gcd65ce25 k8s.gcr.io/kubeflow-images-public/notebook-controller:v1.0.0-gcd65ce25
docker tag docker.io/sw1136562366/katib-ui:v0.8.0 k8s.gcr.io/kubeflow-images-public/katib/v1alpha3/katib-ui:v0.8.0
docker tag docker.io/sw1136562366/katib-controller:v0.8.0 k8s.gcr.io/kubeflow-images-public/katib/v1alpha3/katib-controller:v0.8.0
docker tag docker.io/sw1136562366/katib-db-manager:v0.8.0 k8s.gcr.io/kubeflow-images-public/katib/v1alpha3/katib-db-manager:v0.8.0
docker tag docker.io/sw1136562366/jupyter-web-app:v1.0.0-g2bd63238 k8s.gcr.io/kubeflow-images-public/jupyter-web-app:v1.0.0-g2bd63238
docker tag docker.io/sw1136562366/centraldashboard:v1.0.0-g3ec0de71 k8s.gcr.io/kubeflow-images-public/centraldashboard:v1.0.0-g3ec0de71
docker tag docker.io/sw1136562366/tf_operator:v1.0.0-g92389064 k8s.gcr.io/kubeflow-images-public/tf_operator:v1.0.0-g92389064
docker tag docker.io/sw1136562366/pytorch-operator:v1.0.0-g047cf0f k8s.gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-g047cf0f
docker tag docker.io/sw1136562366/kfam:v1.0.0-gf3e09203 k8s.gcr.io/kubeflow-images-public/kfam:v1.0.0-gf3e09203
docker tag docker.io/sw1136562366/admission-webhook:v1.0.0-gaf96e4e3 k8s.gcr.io/kubeflow-images-public/admission-webhook:v1.0.0-gaf96e4e3
docker tag docker.io/sw1136562366/metadata:v0.1.11 k8s.gcr.io/kubeflow-images-public/metadata:v0.1.11
docker tag docker.io/sw1136562366/metadata-frontend:v0.1.8 k8s.gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8
docker tag docker.io/sw1136562366/application:1.0-beta k8s.gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta
docker tag docker.io/sw1136562366/ingress-setup:latest k8s.gcr.io/kubeflow-images-public/ingress-setup:latest
# tag kubeflow-images-public images
docker tag docker.io/sw1136562366/activator:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/activator:latest
docker tag docker.io/sw1136562366/webhook:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/webhook:latest
docker tag docker.io/sw1136562366/controller:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/controller:latest
docker tag docker.io/sw1136562366/istio:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/networking/istio:latest
docker tag docker.io/sw1136562366/autoscaler-hpa:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler-hpa:latest
docker tag docker.io/sw1136562366/autoscaler:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler:latest
# tag knative-releases images
docker tag docker.io/sw1136562366/kfserving-controller:0.2.2 k8s.gcr.io/kfserving/kfserving-controller:0.2.2
docker tag docker.io/sw1136562366/ml_metadata_store_server:v0.21.1 k8s.gcr.io/tfx-oss-public/ml_metadata_store_server:v0.21.1
docker tag docker.io/sw1136562366/spark-operator:v1beta2-1.0.0-2.4.4 k8s.gcr.io/spark-operator/spark-operator:v1beta2-1.0.0-2.4.4
docker tag docker.io/sw1136562366/kube-rbac-proxy:v0.4.0 k8s.gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0
docker tag docker.io/sw1136562366/spartakus-amd64:v1.1.0 k8s.gcr.io/google_containers/spartakus-amd64:v1.1.0
五、kubenetes运维基础
#查看名字的命令是:
kubectl get deploy -n kubeflow
kubectl get sts -n kubeflow
#xxx是对应的deploy和statefulset的名字,因为有很多个,所以要一个一个去改
kubectl edit deploy -n kubeflow xxx
kubectl edit sts -n kubeflow xxx
#看一下容器什么状态
kubectl get pods -n kubeflow
#看一下这个容器的状态
kubectl describe pods -n kubeflow xxxxx
#把它删了,会自动重启的
kubectl delete pods -n kubeflow xxx
#statefulset和pod应该要对应起来的,改了statefulset之后对应的pod会变
#看服务报错,看着改,这已经和k8没啥关系了
kubectl logs -n kubeflow xxx
#删除Evicted的pod
kubectl delete pods -n kubeflow $(kubectl get pods -n kubeflow |grep Evicted|awk '{print $1}')
systemctl stop kubelet
systemctl stop docker
systemctl restart network
systemctl restart docker
systemctl restart kubelet
ifconfig
kubectl get node
上一篇: group by
下一篇: mac 使用adb 命令