欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

kubenetes—kubeflow安装部署

程序员文章站 2024-03-11 11:41:43
...

安装教程:
http://www.rhce.cc/2182.html
https://blog.csdn.net/wo18237095579/article/details/86630750
https://www.cnblogs.com/zhongle21/p/12220789.html#_lab2_0_2

一、所有节点

1、在所有节点上修改hostname
hostnamectl set-hostname master
hostnamectl set-hostname node

2、在所有节点上修改ip
vi /etc/sysconfig/network-scripts/ifcfg-ens33

注意事项:(1)BOOTPROTO=static
(2)ONBOOT=yes
(3)IPADDR=10.4.7.23

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=cf12d995-ca97-4e6e-9f08-bdc547ee9478
DEVICE=ens33
ONBOOT=yes
IPADDR=10.4.7.23
NETMASK=255.255.255.0
GATEWAY=10.4.7.1
DNS1=10.4.7.11
DNS2=8.8.8.8
ZONE=public

service network restart

3、在所有节点上同步/etc/hosts
vi /etc/hosts
/etc/init.d/network restart

注意事项:(1)151.101.108.133 raw.githubusercontent.com,这一行是防止配置flannel时连接不上raw.githubusercontent.com

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.133.99 Centos dbserver
10.4.7.23 kubeadm1
10.4.7.24 kubeadm2
151.101.108.133 raw.githubusercontent.com

4、在所有节点上配置防火墙和关闭selinux
systemctl disable firewalld.service
systemctl stop firewalld.service
setenforce 0

vim /etc/selinux/config
SELINUX=disabled

5、在所有节点上关闭swap
swapoff -a

6、在所有节点上配置好yum源
vi /etc/yum.repos.d/kubernetes.repo

[kubernetes]
name=Kubernetes Repo
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg

7、在所有节点安装并启动docker,并设置docker自动启动
yum install docker -y
systemctl enable docker --now

8、在所有节点设置相关属性
vi /etc/sysctl.d/k8s.conf

net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness=0

9、在所有节点上安装软件包
yum install -y kubelet-1.18.2-0 kubeadm-1.18.2-0 kubectl-1.18.2-0 --disableexcludes=kubernetes

10、在所有节点上启动kubelet,并设置开机自动启动
systemctl restart kubelet
systemctl enable kubelet

echo “export KUBECONFIG=/etc/kubernetes/admin.conf” >> /etc/profile
source /etc/profile
echo $KUBECONFIG
scp /etc/kubernetes/admin.conf [email protected]:/etc/kubernetes/

11、修改阿里云的源
vi /etc/docker/daemon.json
{
“registry-mirrors”: [“https://registry.docker-cn.com”]
}

systemctl daemon-reload
systemctl restart docker

二、安装master

https://blog.csdn.net/chenxun_2010/article/details/107109311/

1、初始化集群
kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.18.2 --pod-network-cidr=10.244.0.0/16

2、添加配置
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown ( i d − u ) : (id -u): (idu):(id -g) $HOME/.kube/config

3、添加flannel
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml

4、等待5分钟左右验证节点
kubectl get pods --all-namespaces -o wide
kubectl get nodes

三、安装node

1、主节点上获取token
kubeadm token create --print-join-command

2、将从节点添加至集群
kubeadm join 10.4.7.23:6443 --token 133ugt.f9xaifwfbi9963th --discovery-token-ca-cert-hash sha256:14bc9da5c0e879882120a9a8332b12dba546b450ceee6805263fe2a8ee7567b6

注意事项:
(1)报错Config not found: /etc/kubernetes/admin.conf
把master节点的对应文件复制到node节点的相应位置,
请参照https://www.cnblogs.com/wind-zhou/p/12829079.html
(2)报错:/etc/kubernetes/kubelet.conf already exists
/etc/kubernetes/pki/ca.crt already exists
请参照https://www.cnblogs.com/liuyi778/p/12229416.html
或者加上 --ignore-preflight-errors=all
(3)报错:[ERROR Port-10250]: Port 10250 is in use
systemctl restart kubelet
还有一种可能是在master节点上部署node节点,该端口已经占用,请自行百度解决

3、部署nginx
https://blog.csdn.net/qq_43279371/article/details/107768713

四、部署kubeflow

docker pull istio/sidecar_injector:1.1.6
docker pull istio/proxyv2:1.1.6
docker pull istio/proxy_init:1.1.6
docker pull istio/pilot:1.1.6
docker pull istio/mixer:1.1.6
docker pull istio/galley:1.1.6
docker pull istio/citadel:1.1.6

docker pull sw1136562366/viewer-crd-controller:0.2.5
docker pull sw1136562366/api-server:0.2.5
docker pull sw1136562366/frontend:0.2.5
docker pull sw1136562366/visualization-server:0.2.5
docker pull sw1136562366/scheduledworkflow:0.2.5
docker pull sw1136562366/persistenceagent:0.2.5
docker pull sw1136562366/envoy:metadata-grpc

docker pull sw1136562366/profile-controller:v1.0.0-ge50a8531
docker pull sw1136562366/notebook-controller:v1.0.0-gcd65ce25
docker pull sw1136562366/katib-ui:v0.8.0
docker pull sw1136562366/katib-controller:v0.8.0
docker pull sw1136562366/katib-db-manager:v0.8.0
docker pull sw1136562366/jupyter-web-app:v1.0.0-g2bd63238
docker pull sw1136562366/centraldashboard:v1.0.0-g3ec0de71
docker pull sw1136562366/tf_operator:v1.0.0-g92389064
docker pull sw1136562366/pytorch-operator:v1.0.0-g047cf0f
docker pull sw1136562366/kfam:v1.0.0-gf3e09203
docker pull sw1136562366/admission-webhook:v1.0.0-gaf96e4e3
docker pull sw1136562366/metadata:v0.1.11
docker pull sw1136562366/metadata-frontend:v0.1.8
docker pull sw1136562366/application:1.0-beta
docker pull sw1136562366/ingress-setup:latest

docker pull sw1136562366/activator:latest
docker pull sw1136562366/webhook:latest
docker pull sw1136562366/controller:latest
docker pull sw1136562366/istio:latest
docker pull sw1136562366/autoscaler-hpa:latest
docker pull sw1136562366/autoscaler:latest

docker pull sw1136562366/kfserving-controller:0.2.2
docker pull sw1136562366/ml_metadata_store_server:v0.21.1
docker pull sw1136562366/spark-operator:v1beta2-1.0.0-2.4.4
docker pull sw1136562366/kube-rbac-proxy:v0.4.0
docker pull sw1136562366/spartakus-amd64:v1.1.0

docker pull argoproj/workflow-controller:v2.3.0
docker pull argoproj/argoui:v2.3.0
docker tag docker.io/sw1136562366/ingress-setup:latest k8s.gcr.io/kubeflow-images-public/ingress-setup:latest
docker tag docker.io/sw1136562366/ingress-setup:latest gcr.io/kubeflow-images-public/ingress-setup:latest


# tag ml-pipeline images
docker tag docker.io/sw1136562366/viewer-crd-controller:0.2.5 k8s.gcr.io/ml-pipeline/viewer-crd-controller:0.2.5
docker tag docker.io/sw1136562366/api-server:0.2.5 k8s.gcr.io/ml-pipeline/api-server:0.2.5
docker tag docker.io/sw1136562366/frontend:0.2.5 k8s.gcr.io/ml-pipeline/frontend:0.2.5
docker tag docker.io/sw1136562366/visualization-server:0.2.5 k8s.gcr.io/ml-pipeline/visualization-server:0.2.5
docker tag docker.io/sw1136562366/scheduledworkflow:0.2.5 k8s.gcr.io/ml-pipeline/scheduledworkflow:0.2.5
docker tag docker.io/sw1136562366/persistenceagent:0.2.5 k8s.gcr.io/ml-pipeline/persistenceagent:0.2.5
docker tag docker.io/sw1136562366/envoy:metadata-grpc k8s.gcr.io/ml-pipeline/envoy:metadata-grpc

# tag kubeflow-images-public images
docker tag docker.io/sw1136562366/profile-controller:v1.0.0-ge50a8531 k8s.gcr.io/kubeflow-images-public/profile-controller:v1.0.0-ge50a8531
docker tag docker.io/sw1136562366/notebook-controller:v1.0.0-gcd65ce25 k8s.gcr.io/kubeflow-images-public/notebook-controller:v1.0.0-gcd65ce25
docker tag docker.io/sw1136562366/katib-ui:v0.8.0 k8s.gcr.io/kubeflow-images-public/katib/v1alpha3/katib-ui:v0.8.0
docker tag docker.io/sw1136562366/katib-controller:v0.8.0 k8s.gcr.io/kubeflow-images-public/katib/v1alpha3/katib-controller:v0.8.0
docker tag docker.io/sw1136562366/katib-db-manager:v0.8.0 k8s.gcr.io/kubeflow-images-public/katib/v1alpha3/katib-db-manager:v0.8.0
docker tag docker.io/sw1136562366/jupyter-web-app:v1.0.0-g2bd63238 k8s.gcr.io/kubeflow-images-public/jupyter-web-app:v1.0.0-g2bd63238
docker tag docker.io/sw1136562366/centraldashboard:v1.0.0-g3ec0de71 k8s.gcr.io/kubeflow-images-public/centraldashboard:v1.0.0-g3ec0de71
docker tag docker.io/sw1136562366/tf_operator:v1.0.0-g92389064 k8s.gcr.io/kubeflow-images-public/tf_operator:v1.0.0-g92389064
docker tag docker.io/sw1136562366/pytorch-operator:v1.0.0-g047cf0f k8s.gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-g047cf0f
docker tag docker.io/sw1136562366/kfam:v1.0.0-gf3e09203 k8s.gcr.io/kubeflow-images-public/kfam:v1.0.0-gf3e09203
docker tag docker.io/sw1136562366/admission-webhook:v1.0.0-gaf96e4e3 k8s.gcr.io/kubeflow-images-public/admission-webhook:v1.0.0-gaf96e4e3
docker tag docker.io/sw1136562366/metadata:v0.1.11 k8s.gcr.io/kubeflow-images-public/metadata:v0.1.11
docker tag docker.io/sw1136562366/metadata-frontend:v0.1.8 k8s.gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8
docker tag docker.io/sw1136562366/application:1.0-beta k8s.gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta
docker tag docker.io/sw1136562366/ingress-setup:latest k8s.gcr.io/kubeflow-images-public/ingress-setup:latest

# tag kubeflow-images-public images
docker tag docker.io/sw1136562366/activator:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/activator:latest
docker tag docker.io/sw1136562366/webhook:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/webhook:latest
docker tag docker.io/sw1136562366/controller:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/controller:latest
docker tag docker.io/sw1136562366/istio:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/networking/istio:latest
docker tag docker.io/sw1136562366/autoscaler-hpa:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler-hpa:latest
docker tag docker.io/sw1136562366/autoscaler:latest k8s.gcr.io/knative-releases/knative.dev/serving/cmd/autoscaler:latest

# tag knative-releases images
docker tag docker.io/sw1136562366/kfserving-controller:0.2.2 k8s.gcr.io/kfserving/kfserving-controller:0.2.2
docker tag docker.io/sw1136562366/ml_metadata_store_server:v0.21.1 k8s.gcr.io/tfx-oss-public/ml_metadata_store_server:v0.21.1
docker tag docker.io/sw1136562366/spark-operator:v1beta2-1.0.0-2.4.4 k8s.gcr.io/spark-operator/spark-operator:v1beta2-1.0.0-2.4.4
docker tag docker.io/sw1136562366/kube-rbac-proxy:v0.4.0 k8s.gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0
docker tag docker.io/sw1136562366/spartakus-amd64:v1.1.0 k8s.gcr.io/google_containers/spartakus-amd64:v1.1.0

五、kubenetes运维基础

#查看名字的命令是:
kubectl get deploy -n kubeflow
kubectl get sts -n kubeflow

#xxx是对应的deploy和statefulset的名字,因为有很多个,所以要一个一个去改
kubectl edit deploy -n kubeflow xxx
kubectl edit sts -n kubeflow xxx

#看一下容器什么状态
kubectl get pods -n kubeflow
#看一下这个容器的状态
kubectl describe pods -n kubeflow xxxxx

#把它删了,会自动重启的
kubectl delete pods -n kubeflow xxx

#statefulset和pod应该要对应起来的,改了statefulset之后对应的pod会变

#看服务报错,看着改,这已经和k8没啥关系了
kubectl logs -n kubeflow xxx

#删除Evicted的pod
kubectl delete pods -n kubeflow $(kubectl get pods -n kubeflow |grep Evicted|awk '{print $1}')


systemctl stop kubelet
systemctl stop docker
systemctl restart network
systemctl restart docker
systemctl restart kubelet
ifconfig
kubectl get node

上一篇: group by

下一篇: mac 使用adb 命令