二进制安装Kubernetes

系统初始化

设置永久主机名

1
hostnamectl set-hostname k8s-m1

设置主机解析

如果不存在DNS解析,则每台主机需设置/etc/hosts文件,添加主机与IP的对应关系。

1
2
3
4
5
6
cat >> /etc/hosts <<EOF
10.105.26.201 k8s-m1
10.105.26.202 k8s-m2
10.105.26.203 k8s-m3
10.105.26.210 k8s-n1
EOF

免密码登陆其他节点

1
2
3
4
5
ssh-keygen -t rsa
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]

关闭防火墙与selinux

1
2
3
systemctl disable --now firewalld NetworkManager
setenforce 0
sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config

关闭swap分区

如果开启了swap分区,会导致kubelet启动失败(可通过–fail-swap-on参数忽略swap开启)。

1
2
swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/[email protected]^@#@' /etc/fstab

关闭dnsmasq

linux系统开启了dnsmasq(GUI环境),将DNS server设置为127.0.0.1,会导致docker容器无法解析域名。

1
2
systemctl stop dnsmasq
systemctl disable dnsmasq

优化内核参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
cat <<EOF > /etc/sysctl.d/k8s.conf
# https://github.com/moby/moby/issues/31208
# ipvsadm -l --timout
# 修复ipvs模式下长连接timeout问题 小于900即可
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv4.neigh.default.gc_stale_time = 120
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
net.ipv4.ip_forward = 1
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
# 要求iptables不对bridge的数据进行处理
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
net.netfilter.nf_conntrack_max = 2310720
fs.inotify.max_user_watches=89100
fs.may_detach_mounts = 1
fs.file-max = 52706963
fs.nr_open = 52706963
vm.swappiness = 0
vm.overcommit_memory=1
vm.panic_on_oom=0
EOF

sysctl --system

设置系统时区

调整系统TimeZone,将当前的UTC时间写入硬件时钟

1
2
3
4
timedatectl set-timezone Asia/Shanghai
timedatectl set-local-rtc 0
systemctl restart rsyslog
systemctl restart crond

关闭无关的服务

1
systemctl stop postfix && systemctl disable postfix

设置rsyslogd和systemd journald

systemd的journald是Centos 7缺省的日志记录工具,它记录了所有系统、内核、Service Unit的日志。相比systemd,journald记录的日志有如下优势:1. 可以记录到内存或文件系统;(默认记录到内存,对应的位置为 /run/log/jounal);2. 可以限制占用的磁盘空间、保证磁盘剩余空间;3. 可以限制日志文件大小、保存的时间;journald默认将日志转发给rsyslog,这会导致日志写了多份,/var/log/messages中包含了太多无关日志,不方便后续查看,同时也影响系统性能。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mkdir /var/log/journal # 持久化保存日志的目录
mkdir /etc/systemd/journald.conf.d
cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
[Journal]
# 持久化保存到磁盘
Storage=persistent

# 压缩历史日志
Compress=yes

SyncIntervalSec=5m
RateLimitInterval=30s
RateLimitBurst=1000

# 最大占用空间 10G
SystemMaxUse=10G

# 单日志文件最大 200M
SystemMaxFileSize=200M

# 日志保存时间 2 周
MaxRetentionSec=2week

# 不将日志转发到 syslog
ForwardToSyslog=no
EOF
systemctl restart systemd-journald

升级内核

Centos 7.x系统自带的3.10.x内核存在一些Bugs,导致Docker和Kubernetes不稳定,例如:1. 高版本的 docker(1.13 以后) 启用了3.10版本kernel实验支持的kernel memory account功能,当节点压力大如频繁启动和停止容器时会导致cgroup memory leak;2. 网络设备引用计数泄漏,会导致类似于报错:”kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1”;

自选内核安装

1
2
3
export Kernel_Version=4.18.9-1
wget http://mirror.rc.usf.edu/compute_lock/elrepo/kernel/el7/x86_64/RPMS/kernel-ml{,-devel}-${Kernel_Version}.el7.elrepo.x86_64.rpm
yum localinstall -y kernel-ml*

最新内核安装

1
2
3
4
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum --disablerepo="*" --enablerepo="elrepo-kernel" list available --showduplicates | grep -Po '^kernel-ml.x86_64\s+\K\S+(?=.el7)'
yum --disablerepo="*" --enablerepo=elrepo-kernel install -y kernel-ml{,-devel}

修改内核启动顺序

1
2
grub2-set-default  0 && grub2-mkconfig -o /etc/grub2.cfg
grubby --default-kernel

开启user_namespace.enable=1

1
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"

重新加载内核

1
reboot

安装ipvs

1
yum install ipvsadm ipset sysstat conntrack libseccomp vim wget curl jq -y

设置开机自动加载ipvs内核模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
:> /etc/modules-load.d/ipvs.conf
module=(
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
br_netfilter
)
for kernel_module in ${module[@]};do
/sbin/modinfo -F filename $kernel_module |& grep -qv ERROR && echo $kernel_module >> /etc/modules-load.d/ipvs.conf || :
done

systemctl enable --now systemd-modules-load.service

docker条件检查

1
2
curl https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh > check-config.sh
bash check-config.sh

利用官方脚本安装docker

1
2
export VERSION=18.06
curl -fsSL "https://get.docker.com/" | bash -s -- --mirror Aliyun

配置加速源和docker启动参数使用systemd

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mkdir -p /etc/docker/
cat>/etc/docker/daemon.json<<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"registry-mirrors": ["https://ib9xyhrv.mirror.aliyuncs.com"],
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}
EOF

设置docker开机启动,并设置docker命令补全

1
2
yum install -y epel-release bash-completion && cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/
systemctl enable --now docker

创建相关目录并分发集群配置参数脚本

1
2
3
4
5
6
7
source environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp environment.sh [email protected]${node_ip}:/opt/k8s/bin/
ssh [email protected]${node_ip} "chmod +x /opt/k8s/bin/*"
done
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/usr/bin/bash

# 生成 EncryptionConfig 所需的加密 key
export ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)

# 集群各机器 IP 数组
export NODE_IPS=(10.105.26.201 10.105.26.202 10.105.26.203)
export WORKER_IPS=(10.105.26.210)
# 集群各 IP 对应的主机名数组
export NODE_NAMES=(k8s-m1 k8s-m2 k8s-m3)
export WORKER_NAMES=(k8s-n1)
# etcd 集群服务地址列表
export ETCD_ENDPOINTS="https://10.105.26.201:2379,https://10.105.26.202:2379,https://10.105.26.203:2379"

# etcd 集群间通信的 IP 和端口
export ETCD_NODES="k8s-m1=https://10.105.26.201:2380,k8s-m2=https://10.105.26.202:2380,k8s-m3=https://10.105.26.203:2380"

# kube-apiserver 的反向代理(kube-nginx)地址端口
export KUBE_APISERVER="https://127.0.0.1:8443"

# 节点间互联网络接口名称
export IFACE="eth0"

# etcd 数据目录
export ETCD_DATA_DIR="/data/k8s/etcd/data"

# etcd WAL 目录,建议是 SSD 磁盘分区,或者和 ETCD_DATA_DIR 不同的磁盘分区
export ETCD_WAL_DIR="/data/k8s/etcd/wal"

# k8s 各组件数据目录
export K8S_DIR="/data/k8s/k8s"

# docker 数据目录
# export DOCKER_DIR="/data/k8s/docker"

## 以下参数一般不需要修改

# TLS Bootstrapping 使用的 Token,可以使用命令 head -c 16 /dev/urandom | od -An -t x | tr -d ' ' 生成
BOOTSTRAP_TOKEN="41f7e4ba8b7be874fcff18bf5cf41a7c"

# 最好使用 当前未用的网段 来定义服务网段和 Pod 网段

# 服务网段,部署前路由不可达,部署后集群内路由可达(kube-proxy 保证)
SERVICE_CIDR="10.254.0.0/16"

# Pod 网段,建议 /16 段地址,部署前路由不可达,部署后集群内路由可达(flanneld 保证)
CLUSTER_CIDR="172.30.0.0/16"

# 服务端口范围 (NodePort Range)
export NODE_PORT_RANGE="30000-32767"

# flanneld 网络配置前缀
export FLANNEL_ETCD_PREFIX="/kubernetes/network"

# kubernetes 服务 IP (一般是 SERVICE_CIDR 中第一个IP)
export CLUSTER_KUBERNETES_SVC_IP="10.254.0.1"

# 集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配)
export CLUSTER_DNS_SVC_IP="10.254.0.2"

# 集群 DNS 域名(末尾不带点号)
export CLUSTER_DNS_DOMAIN="cluster.local"

# 将二进制目录 /opt/k8s/bin 加到 PATH 中
export PATH=/opt/k8s/bin:$PATH

创建CA证书和秘钥

kubernetes集群各组件需要使用x509证书对通信进行加密和认证。CA (Certificate Authority) 是自签名的根证书,用来签名后续创建的其它证书。

安装cfssl工具

1
2
3
4
5
6
7
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl_linux-amd64 /usr/local/bin/cfssl
mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
mv cfssl-certinfo_linux-amd64 /usr/local/bin/cfssl-certinfo
chmod +x /usr/local/bin/cfssl*

创建根证书

CA证书是集群所有节点共享的,只需要创建一个CA证书,后续创建的所有证书都由它签名。

创建配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF
  • signing:表示该证书可用于签名其它证书,生成的ca.pem证书中CA=TRUE;
  • server auth:表示client可以用该该证书对server提供的证书进行验证;
  • client auth:表示server可以用该该证书对client提供的证书进行验证;

创建证书签名请求文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat > ca-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "k8s",
"OU": "IT"
}
]
}
EOF
  • CN:Common Name,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name),浏览器使用该字段验证网站是否合法;
  • O:Organization,kube-apiserver从证书中提取该字段作为请求用户所属的组 (Group);
  • kube-apiserver将提取的User、Group作为RBAC授权的用户标识;

生成CA证书和私钥

1
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

分发证书到其他节点

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p /etc/kubernetes/pki"
scp ca*.pem ca-config.json [email protected]${node_ip}:/etc/kubernetes/pki
done

部署kubectl工具

下载kubectl二进制文件并分发到其他节点

1
2
3
4
5
6
7
8
wget https://dl.k8s.io/v1.14.2/kubernetes-client-linux-amd64.tar.gz
tar -xzvf kubernetes-client-linux-amd64.tar.gz
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubernetes/client/bin/kubectl [email protected]${node_ip}:/opt/k8s/bin/
ssh [email protected]${node_ip} "chmod +x /opt/k8s/bin/*"
done

创建admin证书和私钥

kubectl与apiserver https安全端口通信,apiserver对提供的证书进行认证和授权。kubectl作为集群的管理工具,需要被授予最高权限,这里创建具有最高权限的admin证书。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat > admin-csr.json <<EOF
{
"CN": "admin",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "system:masters",
"OU": "IT"
}
]
}
EOF

  • O为system:masters,kube-apiserver收到该证书后将请求的Group设置为 system:masters;
  • 预定义的ClusterRoleBinding cluster-admin将Group system:masters与Role cluster-admin绑定,该Role授予所有 PI的权限;
  • 该证书只会被kubectl当做client证书使用,所以hosts字段为空;

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes admin-csr.json | cfssljson -bare admin

创建kubeconfig文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kubectl.kubeconfig

# 设置客户端认证参数
kubectl config set-credentials admin \
--client-certificate=/opt/k8s/work/admin.pem \
--client-key=/opt/k8s/work/admin-key.pem \
--embed-certs=true \
--kubeconfig=kubectl.kubeconfig

# 设置上下文参数
kubectl config set-context kubernetes \
--cluster=kubernetes \
--user=admin \
--kubeconfig=kubectl.kubeconfig

# 设置默认上下文
kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig
  • –certificate-authority:验证kube-apiserver证书的根证书;
  • –client-certificate、–client-key:刚生成的admin证书和私钥,连接 kube-apiserver时使用;
  • –embed-certs=true:将ca.pem和admin.pem证书内容嵌入到生成的kubectl.kubeconfig文件中(不加时,写入的是证书文件路径,后续拷贝 kubeconfig到其它机器时,还需要单独拷贝证书文件,不方便。);

分发kubeconfig文件到其他节点

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p ~/.kube"
scp kubectl.kubeconfig [email protected]${node_ip}:~/.kube/config
done

部署etcd集群

etcd是基于Raft的分布式key-value存储系统,由CoreOS开发,常用于服务发现、共享配置以及并发控制(如leader选举、分布式锁等)。kubernetes使用 etcd存储所有运行数据。

下载etcd二进制文件并分发

1
2
3
4
5
6
7
8
wget https://github.com/coreos/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz
tar -xvf etcd-v3.3.13-linux-amd64.tar.gz
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp etcd-v3.3.13-linux-amd64/etcd* [email protected]${node_ip}:/opt/k8s/bin
ssh [email protected]${node_ip} "chmod +x /opt/k8s/bin/*"
done

创建etcd证书和私钥

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat > etcd-csr.json <<EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"10.105.26.201",
"10.105.26.202",
"10.105.26.203"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "k8s",
"OU": "IT"
}
]
}
EOF
  • 在生产环境在证书内预留几个IP,已防止意外故障迁移时还需要重新生成证书

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes etcd-csr.json | cfssljson -bare etcd

分发证书和私钥到其他节点

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/pki/etcd"
scp etcd*.pem [email protected]${node_ip}:/etc/kubernetes/pki/etcd/
done

创建etcd的systemd unit模版文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
cat > etcd.service.template <<EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=${ETCD_DATA_DIR}
ExecStart=/opt/k8s/bin/etcd \\
--data-dir=${ETCD_DATA_DIR} \\
--wal-dir=${ETCD_WAL_DIR} \\
--name=##NODE_NAME## \\
--cert-file=/etc/kubernetes/pki/etcd/etcd.pem \\
--key-file=/etc/kubernetes/pki/etcd/etcd-key.pem \\
--trusted-ca-file=/etc/kubernetes/pki/ca.pem \\
--peer-cert-file=/etc/kubernetes/pki/etcd/etcd.pem \\
--peer-key-file=/etc/kubernetes/pki/etcd/etcd-key.pem \\
--peer-trusted-ca-file=/etc/kubernetes/pki/ca.pem \\
--peer-client-cert-auth \\
--client-cert-auth \\
--listen-peer-urls=https://##NODE_IP##:2380 \\
--initial-advertise-peer-urls=https://##NODE_IP##:2380 \\
--listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \\
--advertise-client-urls=https://##NODE_IP##:2379 \\
--initial-cluster-token=etcd-cluster-0 \\
--initial-cluster=${ETCD_NODES} \\
--initial-cluster-state=new \\
--auto-compaction-mode=periodic \\
--auto-compaction-retention=1 \\
--max-request-bytes=33554432 \\
--quota-backend-bytes=6442450944 \\
--heartbeat-interval=250 \\
--election-timeout=2000
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
  • WorkingDirectory、–data-dir:指定工作目录和数据目录为 ${ETCD_DATA_DIR},需在启动服务前创建这个目录;
  • –wal-dir:指定wal目录,为了提高性能,一般使用SSD或者和–data-dir不同的磁盘;
  • –name:指定节点名称,当–initial-cluster-state值为new时,–name的参数值必须位于–initial-cluster列表中;
  • –cert-file、–key-file:etcd server与client通信时使用的证书和私钥;
  • –trusted-ca-file:签名client证书的CA证书,用于验证client证书;
  • –peer-cert-file、–peer-key-file:etcd与peer通信使用的证书和私钥;
  • –peer-trusted-ca-file:签名peer证书的CA证书,用于验证peer证书;

分发etcd system unit文件到其他节点

替换模板文件中的变量,为各节点创建systemd unit文件

1
2
3
4
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" etcd.service.template > etcd-${NODE_IPS[i]}.service
done

分发生成的systemd unit文件

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp etcd-${node_ip}.service [email protected]${node_ip}:/etc/systemd/system/etcd.service
done

启动etcd服务

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}"
ssh [email protected]${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd " &
done
  • 必须先创建etcd数据目录和工作目录
  • etcd进程首次启动时会等待其它节点的etcd加入集群,命令systemctl start etcd会卡住一段时间,为正常现象

验证服务状态

1
2
3
4
5
6
7
8
9
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ETCDCTL_API=3 /opt/k8s/bin/etcdctl \
--endpoints=https://${node_ip}:2379 \
--cacert=/etc/kubernetes/pki/ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem endpoint health
done

预期输出

1
2
3
4
5
6
>>> 10.105.26.201
https://10.105.26.201:2379 is healthy: successfully committed proposal: took = 1.706544ms
>>> 10.105.26.202
https://10.105.26.202:2379 is healthy: successfully committed proposal: took = 2.495669ms
>>> 10.105.26.203
https://10.105.26.203:2379 is healthy: successfully committed proposal: took = 2.228788ms

查看当前的 leader

1
2
3
4
5
ETCDCTL_API=3 /opt/k8s/bin/etcdctl \
-w table --cacert=/etc/kubernetes/pki/ca.pem \
--cert=/etc/kubernetes/pki/etcd/etcd.pem \
--key=/etc/kubernetes/pki/etcd/etcd-key.pem \
--endpoints=${ETCD_ENDPOINTS} endpoint status

预期输出

1
2
3
4
5
6
7
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://10.105.26.201:2379 | 41d3e233e1ce5cff | 3.3.13 | 20 kB | true | 2 | 8 |
| https://10.105.26.202:2379 | 54033052a4cf5146 | 3.3.13 | 20 kB | false | 2 | 8 |
| https://10.105.26.203:2379 | 5b1caf2378628ff0 | 3.3.13 | 20 kB | false | 2 | 8 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

部署flannel网络

kubernetes要求集群内各节点(包括master节点)能通过Pod网段互联互通。flannel使用 vxlan技术为各节点创建一个可以互通的Pod网络,使用的端口为 UDP8472。flanneld第一次启动时,从etcd获取配置的Pod网段信息,为本节点分配一个未使用的地址段,然后创建flannedl.1网络接口。flannel将分配给自己的Pod网段信息写入/run/flannel/docker文件,docker后续使用这个文件中的环境变量设置docker0网桥,从而从这个地址段为本节点的所有Pod容器分配 IP。

下载flanneld二进制文件

1
2
3
4
5
6
7
8
9
mkdir /opt/k8s/work/flannel
wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz
tar -xzvf flannel-v0.11.0-linux-amd64.tar.gz -C flannel
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp flannel/{flanneld,mk-docker-opts.sh} [email protected]${node_ip}:/opt/k8s/bin/
ssh [email protected]${node_ip} "chmod +x /opt/k8s/bin/*"
done

创建flannel证书和私钥

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat > flanneld-csr.json <<EOF
{
"CN": "flanneld",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "k8s",
"OU": "IT"
}
]
}
EOF
  • 该证书只会被kubectl当做client证书使用,所以hosts字段为空

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld

将生成的证书和私钥分发到master和worker

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p /etc/flanneld/cert"
scp flanneld*.pem [email protected]${node_ip}:/etc/flanneld/cert
done

向etcd写入集群Pod网段信息

1
2
3
4
5
6
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/opt/k8s/work/ca.pem \
--cert-file=/opt/k8s/work/flanneld.pem \
--key-file=/opt/k8s/work/flanneld-key.pem \
mk ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}'
  • flanneld当前版本(v0.11.0)不支持etcd v3,故使用etcd v2 API写入配置key和网段数据
  • 写入的Pod网段${CLUSTER_CIDR}地址段(如 /16)必须小于SubnetLen,必须与kube-controller-manager的–cluster-cidr参数值一致

创建flanneld的systemd unit文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
cat > flanneld.service << EOF
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
Before=docker.service

[Service]
Type=notify
ExecStart=/opt/k8s/bin/flanneld \\
-etcd-cafile=/etc/kubernetes/pki/ca.pem \\
-etcd-certfile=/etc/flanneld/cert/flanneld.pem \\
-etcd-keyfile=/etc/flanneld/cert/flanneld-key.pem \\
-etcd-endpoints=${ETCD_ENDPOINTS} \\
-etcd-prefix=${FLANNEL_ETCD_PREFIX} \\
-iface=${IFACE} \\
-ip-masq
ExecStartPost=/opt/k8s/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
Restart=always
RestartSec=5
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
EOF
  • mk-docker-opts.sh脚本将分配给flanneld的Pod子网段信息写入/run/flannel/docker文件,后续docker启动时使用这个文件中的环境变量配置docker0 网桥
  • flanneld使用系统缺省路由所在的接口与其它节点通信,对于有多个网络接口(如内网和公网)的节点,可以用-iface参数指定通信接口
  • flanneld运行时需要root权限
  • -ip-masq: flanneld为访问Pod网络外的流量设置SNAT规则,同时将传递给 Docker的变量–ip-masq(/run/flannel/docker文件中)设置为false,这样 Docker将不再创建SNAT规则;Docker的–ip-masq为true时,创建的SNAT规则比较“暴力”:将所有本节点Pod发起的、访问非docker0接口的请求做SNAT,这样访问其他节点Pod的请求来源IP会被设置为flannel.1接口的IP,导致目的 Pod看不到真实的来源Pod IP。flanneld创建的SNAT规则比较温和,只对访问非 Pod网段的请求做SNAT

分发flanneld systemd unit文件到master和worker

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp flanneld.service [email protected]${node_ip}:/etc/systemd/system/
done

启动flanneld服务

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld"
done

检查分配给各flanneld的Pod网段信息

查看集群 Pod 网段(/16)

1
2
3
4
5
6
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/pki/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/config

预期输出

1
{"Network":"172.30.0.0/16", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}

查看已分配的 Pod 子网段列表(/24)

1
2
3
4
5
6
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/pki/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
ls ${FLANNEL_ETCD_PREFIX}/subnets

可能的输出

1
2
3
/kubernetes/network/subnets/172.30.224.0-21
/kubernetes/network/subnets/172.30.128.0-21
/kubernetes/network/subnets/172.30.232.0-21

查看某一Pod网段对应的节点IP和flannel接口地址

1
2
3
4
5
6
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/pki/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/subnets/172.30.232.0-21

可能的输出

1
{"PublicIP":"10.105.26.203","BackendType":"vxlan","BackendData":{"VtepMAC":"26:b4:0b:f2:56:ce"}}

  • 172.30.232.0/21被分配给节点k8s-m3(10.105.26.203)
  • VtepMAC为k8s-m3节点的flannel.1网卡MAC 地址

验证各节点能通过Pod网段互通

验证是否创建了flannel接口

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet"
done

预期输出

1
2
3
4
5
6
>>> 10.105.26.201
inet 172.30.224.0/32 scope global flannel.1
>>> 10.105.26.202
inet 172.30.128.0/32 scope global flannel.1
>>> 10.105.26.203
inet 172.30.232.0/32 scope global flannel.1

在各节点上ping所有flannel接口IP

1
2
3
4
5
6
7
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "ping -c 1 172.30.128.0"
ssh ${node_ip} "ping -c 1 172.30.224.0"
ssh ${node_ip} "ping -c 1 172.30.232.0"
done

kube-apiserver高可用之nginx代理

  • 控制节点的kube-controller-manager、kube-scheduler是多实例部署,所以只要有一个实例正常,就可以保证高可用
  • 集群内的Pod使用K8S服务域名kubernetes访问kube-apiserver,kube-dns会自动解析出多个kube-apiserver节点的IP,所以也是高可用的
  • 在每个节点起一个nginx进程,后端对接多个apiserver实例,nginx对它们做健康检查和负载均衡
  • kubelet、kube-proxy、controller-manager、scheduler通过本地的nginx(监听 127.0.0.1)访问kube-apiserver,从而实现kube-apiserver的高可用

下载和编译nginx

1
2
wget http://nginx.org/download/nginx-1.15.3.tar.gz
tar -xzvf nginx-1.15.3.tar.gz

编译参数

1
2
3
cd nginx-1.15.3
mkdir nginx-prefix
./configure --with-stream --without-http --prefix=$(pwd)/nginx-prefix --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module
  • –with-stream:开启4层透明转发(TCP Proxy)功能
  • –without-xxx:关闭所有其他功能,这样生成的动态链接二进制程序依赖最小

输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Configuration summary
+ PCRE library is not used
+ OpenSSL library is not used
+ zlib library is not used

nginx path prefix: "/opt/k8s/work/nginx-1.15.3/nginx-prefix"
nginx binary file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx"
nginx modules path: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/modules"
nginx configuration prefix: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/conf"
nginx configuration file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/conf/nginx.conf"
nginx pid file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/logs/nginx.pid"
nginx error log file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/logs/error.log"
nginx http access log file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/logs/access.log"
nginx http client request body temporary files: "client_body_temp"
nginx http proxy temporary files: "proxy_temp"

编译和安装

1
2
cd /opt/k8s/work/nginx-1.15.3
make && make install

验证编译

1
./nginx-prefix/sbin/nginx -v

输出

1
nginx version: nginx/1.15.3

查看 nginx 动态链接的库

1
ldd ./nginx-prefix/sbin/nginx

输出

1
2
3
4
5
linux-vdso.so.1 =>  (0x00007ffd5bdd8000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fe523035000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe522e19000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe522a4c000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe523239000)

安装和部署nginx

创建目录结构

1
2
3
4
5
6
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}
done

分发二进制程序

1
2
3
4
5
6
7
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp /opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx [email protected]${node_ip}:/opt/k8s/kube-nginx/sbin/kube-nginx
ssh [email protected]${node_ip} "chmod a+x /opt/k8s/kube-nginx/sbin/*"
ssh [email protected]${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
done

配置nginx,开启4层透明转发功能

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat > kube-nginx.conf <<EOF
worker_processes 1;

events {
worker_connections 1024;
}

stream {
upstream backend {
hash $remote_addr consistent;
server 10.105.26.201:6443 max_fails=3 fail_timeout=30s;
server 10.105.26.202:6443 max_fails=3 fail_timeout=30s;
server 10.105.26.203:6443 max_fails=3 fail_timeout=30s;
}

server {
listen 127.0.0.1:8443;
proxy_connect_timeout 1s;
proxy_pass backend;
}
}
EOF

分发配置文件

1
2
3
4
5
6
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-nginx.conf [email protected]${node_ip}:/opt/k8s/kube-nginx/conf/kube-nginx.conf
done

配置systemd unit文件,启动服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat > kube-nginx.service <<EOF
[Unit]
Description=kube-apiserver nginx proxy
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=forking
ExecStartPre=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -t
ExecStart=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx
ExecReload=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -s reload
PrivateTmp=true
Restart=always
RestartSec=5
StartLimitInterval=0
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

分发systemd unit文件

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-nginx.service [email protected]${node_ip}:/etc/systemd/system/
done

启动kube-nginx服务

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx"
done

检查kube-nginx服务运行状态

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "systemctl status kube-nginx |grep 'Active:'"
done

部署master节点

kubernetes master节点运行如下组件

  • kube-apiserver
  • kube-scheduler
  • kube-controller-manager
  • kube-nginx
  1. kube-scheduler和kube-controller-manager会自动选举产生一个leader实例,其它实例处于阻塞模式,当leader挂了后,重新选举产生新的leader,从而保证服务可用性
  2. kube-apiserver是无状态的,需要通过kube-nginx进行代理访问,从而保证服务可用性

下载nginx二进制文件并分发

1
2
3
4
5
6
7
8
9
10
wget https://dl.k8s.io/v1.14.2/kubernetes-server-linux-amd64.tar.gz
tar -xzvf kubernetes-server-linux-amd64.tar.gz
cd kubernetes
tar -xzvf kubernetes-src.tar.gz
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubernetes/server/bin/{apiextensions-apiserver,cloud-controller-manager,kube-apiserver,kube-controller-manager,kube-proxy,kube-scheduler,kubeadm,kubectl,kubelet,mounter} [email protected]${node_ip}:/opt/k8s/bin/
ssh [email protected]${node_ip} "chmod +x /opt/k8s/bin/*"
done

部署高可用kube-apiserver集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cat > kubernetes-csr.json <<EOF
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"10.105.26.201",
"10.105.26.202",
"10.105.26.203",
"10.254.0.1",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local."
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "k8s",
"OU": "System"
}
]
}
EOF
  • hosts字段指定授权使用该证书的IP和域名列表,这里列出了master节点IP、kubernetes服务的IP和域名
  • kubernetes服务IP是apiserver自动创建的,一般是–service-cluster-ip-range参数指定的网段的第一个IP
1
2
3
kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 15m

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes

将生成的证书和私钥文件拷贝到所有master节点

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p /etc/kubernetes/pki"
scp kubernetes*.pem [email protected]${node_ip}:/etc/kubernetes/pki/
done

创建加密配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
cat > encryption-config.yaml <<EOF
kind: EncryptionConfig
apiVersion: v1
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_KEY}
- identity: {}
EOF

将加密配置文件拷贝到master节点的/etc/kubernetes目录下

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp encryption-config.yaml [email protected]${node_ip}:/etc/kubernetes/
done

创建审计策略文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
cat > audit-policy.yaml <<EOF
apiVersion: audit.k8s.io/v1beta1
kind: Policy
rules:
# The following requests were manually identified as high-volume and low-risk, so drop them.
- level: None
resources:
- group: ""
resources:
- endpoints
- services
- services/status
users:
- 'system:kube-proxy'
verbs:
- watch

- level: None
resources:
- group: ""
resources:
- nodes
- nodes/status
userGroups:
- 'system:nodes'
verbs:
- get

- level: None
namespaces:
- kube-system
resources:
- group: ""
resources:
- endpoints
users:
- 'system:kube-controller-manager'
- 'system:kube-scheduler'
- 'system:serviceaccount:kube-system:endpoint-controller'
verbs:
- get
- update

- level: None
resources:
- group: ""
resources:
- namespaces
- namespaces/status
- namespaces/finalize
users:
- 'system:apiserver'
verbs:
- get

# Don't log HPA fetching metrics.
- level: None
resources:
- group: metrics.k8s.io
users:
- 'system:kube-controller-manager'
verbs:
- get
- list

# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- '/healthz*'
- /version
- '/swagger*'

# Don't log events requests.
- level: None
resources:
- group: ""
resources:
- events

# node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes
- level: Request
omitStages:
- RequestReceived
resources:
- group: ""
resources:
- nodes/status
- pods/status
users:
- kubelet
- 'system:node-problem-detector'
- 'system:serviceaccount:kube-system:node-problem-detector'
verbs:
- update
- patch

- level: Request
omitStages:
- RequestReceived
resources:
- group: ""
resources:
- nodes/status
- pods/status
userGroups:
- 'system:nodes'
verbs:
- update
- patch

# deletecollection calls can be large, don't log responses for expected namespace deletions
- level: Request
omitStages:
- RequestReceived
users:
- 'system:serviceaccount:kube-system:namespace-controller'
verbs:
- deletecollection

# Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
omitStages:
- RequestReceived
resources:
- group: ""
resources:
- secrets
- configmaps
- group: authentication.k8s.io
resources:
- tokenreviews
# Get repsonses can be large; skip them.
- level: Request
omitStages:
- RequestReceived
resources:
- group: ""
- group: admissionregistration.k8s.io
- group: apiextensions.k8s.io
- group: apiregistration.k8s.io
- group: apps
- group: authentication.k8s.io
- group: authorization.k8s.io
- group: autoscaling
- group: batch
- group: certificates.k8s.io
- group: extensions
- group: metrics.k8s.io
- group: networking.k8s.io
- group: policy
- group: rbac.authorization.k8s.io
- group: scheduling.k8s.io
- group: settings.k8s.io
- group: storage.k8s.io
verbs:
- get
- list
- watch

# Default level for known APIs
- level: RequestResponse
omitStages:
- RequestReceived
resources:
- group: ""
- group: admissionregistration.k8s.io
- group: apiextensions.k8s.io
- group: apiregistration.k8s.io
- group: apps
- group: authentication.k8s.io
- group: authorization.k8s.io
- group: autoscaling
- group: batch
- group: certificates.k8s.io
- group: extensions
- group: metrics.k8s.io
- group: networking.k8s.io
- group: policy
- group: rbac.authorization.k8s.io
- group: scheduling.k8s.io
- group: settings.k8s.io
- group: storage.k8s.io

# Default level for all other requests.
- level: Metadata
omitStages:
- RequestReceived
EOF

分发审计策略文件

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp audit-policy.yaml [email protected]${node_ip}:/etc/kubernetes/audit-policy.yaml
done

创建metrics-server使用的证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat > proxy-client-csr.json <<EOF
{
"CN": "aggregator",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "k8s",
"OU": "System"
}
]
}
EOF
  • CN名称为aggregator,需要与metrics-server的–requestheader-allowed-names参数配置一致,否则访问会被metrics-server拒绝

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/etc/kubernetes/pki/ca.pem \
-ca-key=/etc/kubernetes/pki/ca-key.pem \
-config=/etc/kubernetes/pki/ca-config.json \
-profile=kubernetes proxy-client-csr.json | cfssljson -bare proxy-client

将生成的证书和私钥文件拷贝到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp proxy-client*.pem [email protected]${node_ip}:/etc/kubernetes/pki/
done

创建kube-apiserver systemd unit模板文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
cat > kube-apiserver.service.template <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
WorkingDirectory=${K8S_DIR}/kube-apiserver
ExecStart=/opt/k8s/bin/kube-apiserver \\
--advertise-address=##NODE_IP## \\
--default-not-ready-toleration-seconds=360 \\
--default-unreachable-toleration-seconds=360 \\
--feature-gates=DynamicAuditing=true \\
--max-mutating-requests-inflight=2000 \\
--max-requests-inflight=4000 \\
--default-watch-cache-size=200 \\
--delete-collection-workers=2 \\
--encryption-provider-config=/etc/kubernetes/encryption-config.yaml \\
--etcd-cafile=/etc/kubernetes/pki/ca.pem \\
--etcd-certfile=/etc/kubernetes/pki/kubernetes.pem \\
--etcd-keyfile=/etc/kubernetes/pki/kubernetes-key.pem \\
--etcd-servers=${ETCD_ENDPOINTS} \\
--bind-address=##NODE_IP## \\
--secure-port=6443 \\
--tls-cert-file=/etc/kubernetes/pki/kubernetes.pem \\
--tls-private-key-file=/etc/kubernetes/pki/kubernetes-key.pem \\
--insecure-port=0 \\
--audit-dynamic-configuration \\
--audit-log-maxage=15 \\
--audit-log-maxbackup=3 \\
--audit-log-maxsize=100 \\
--audit-log-mode=batch \\
--audit-log-truncate-enabled \\
--audit-log-batch-buffer-size=20000 \\
--audit-log-batch-max-size=2 \\
--audit-log-path=${K8S_DIR}/kube-apiserver/audit.log \\
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \\
--profiling \\
--anonymous-auth=false \\
--client-ca-file=/etc/kubernetes/pki/ca.pem \\
--enable-bootstrap-token-auth \\
--requestheader-allowed-names="" \\
--requestheader-client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--service-account-key-file=/etc/kubernetes/pki/ca.pem \\
--authorization-mode=Node,RBAC \\
--runtime-config=api/all=true \\
--enable-admission-plugins=NodeRestriction \\
--allow-privileged=true \\
--apiserver-count=3 \\
--event-ttl=168h \\
--kubelet-certificate-authority=/etc/kubernetes/pki/ca.pem \\
--kubelet-client-certificate=/etc/kubernetes/pki/kubernetes.pem \\
--kubelet-client-key=/etc/kubernetes/pki/kubernetes-key.pem \\
--kubelet-https=true \\
--kubelet-timeout=10s \\
--proxy-client-cert-file=/etc/kubernetes/pki/proxy-client.pem \\
--proxy-client-key-file=/etc/kubernetes/pki/proxy-client-key.pem \\
--service-cluster-ip-range=${SERVICE_CIDR} \\
--service-node-port-range=${NODE_PORT_RANGE} \\
--logtostderr=true \\
--v=2
Restart=on-failure
RestartSec=10
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF
  • –advertise-address:apiserver对外通告的IP(kubernetes服务后端节点 IP);
  • –default-*-toleration-seconds:设置节点异常相关的阈值;
  • –max-*-requests-inflight:请求相关的最大阈值;
  • –etcd-*:访问etcd的证书和etcd服务器地址;
  • –experimental-encryption-provider-config:指定用于加密etcd中secret的配置;
  • –bind-address:https监听的IP,不能为127.0.0.1,否则外界不能访问它的安全端口6443;
  • –secret-port:https 监听端口;
  • –insecure-port=0:关闭监听 http 非安全端口(8080);
  • –tls-*-file:指定 apiserver 使用的证书、私钥和 CA 文件;
  • –audit-*:配置审计策略和审计日志文件相关的参数;
  • –client-ca-file:验证 client (kue-controller-manager、kube-scheduler、kubelet、kube-proxy 等)请求所带的证书;
  • –enable-bootstrap-token-auth:启用 kubelet bootstrap 的 token 认证;
  • –requestheader-*:kube-apiserver 的 aggregator layer 相关的配置参数,proxy-client & HPA 需要使用;
  • –requestheader-client-ca-file:用于签名 –proxy-client-cert-file 和 –proxy-client-key-file 指定的证书;在启用了 metric aggregator 时使用;
  • 如果 –requestheader-allowed-names 不为空,则–proxy-client-cert-file 证书的 CN 必须位于 allowed-names 中,默认为 aggregator;
  • –service-account-key-file:签名 ServiceAccount Token 的公钥文件,kube-controller-manager 的 –service-account-private-key-file 指定私钥文件,两者配对使用;
  • –runtime-config=api/all=true: 启用所有版本的 APIs,如 autoscaling/v2alpha1;
  • –authorization-mode=Node,RBAC、–anonymous-auth=false: 开启 Node 和 RBAC 授权模式,拒绝未授权的请求;
  • –enable-admission-plugins:启用一些默认关闭的 plugins;
  • –allow-privileged:运行执行 privileged 权限的容器;
  • –apiserver-count=3:指定 apiserver 实例的数量;
  • –event-ttl:指定 events 的保存时间;
  • –kubelet-:如果指定,则使用 https 访问 kubelet APIs;需要为证书对应的用户(上面 kubernetes.pem 证书的用户为 kubernetes) 用户定义 RBAC 规则,否则访问 kubelet API 时提示未授权;
  • –proxy-client-*:apiserver 访问 metrics-server 使用的证书;
  • –service-cluster-ip-range: 指定 Service Cluster IP 地址段;
  • –service-node-port-range: 指定 NodePort 的端口范围;
  • 如果kube-apiserver机器没有运行kube-proxy,则还需要添加–enable-aggregator-routing=true参数;
  • requestheader-client-ca-file指定的CA证书,必须具有client auth and server auth

为各节点创建和分发kube-apiserver systemd unit文件

1
2
3
4
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-apiserver.service.template > kube-apiserver-${NODE_IPS[i]}.service
done

分发生成的systemd unit文件

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-apiserver-${node_ip}.service [email protected]${node_ip}:/etc/systemd/system/kube-apiserver.service
done

启动kube-apiserver服务

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p ${K8S_DIR}/kube-apiserver"
ssh [email protected]${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver"
done

检查kube-apiserver运行状态

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "systemctl status kube-apiserver |grep 'Active:'"
done

打印kube-apiserver写入etcd的数据

1
2
3
4
5
6
ETCDCTL_API=3 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--cacert=/opt/k8s/work/ca.pem \
--cert=/opt/k8s/work/etcd.pem \
--key=/opt/k8s/work/etcd-key.pem \
get /registry/ --prefix --keys-only

检查集群信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:8443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

$ kubectl get all --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 12m

$ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}

检查kube-apiserver监听的端口

1
2
netstat -lnpt|grep kube
tcp 0 0 10.105.26.201:6443 0.0.0.0:* LISTEN 26178/kube-apiserve
  • 6443: 接收https请求的安全端口,对所有请求做认证和授权
  • 由于关闭了非安全端口,故没有监听 8080

授予kube-apiserver访问kubelet API的权限

在执行kubectl exec、run、logs等命令时,apiserver会将请求转发到 kubelet的https端口。这里定义RBAC规则,授权apiserver使用的证书(kubernetes.pem)用户名(CN:kuberntes)访问kubelet API的权限

1
2
kubectl create clusterrolebinding kube-apiserver:kubelet-apis \
--clusterrole=system:kubelet-api-admin --user kubernetes

部署高可用kube-controller-manager集群

该集群包含3个节点,启动后将通过竞争选举机制产生一个leader节点,其它节点为阻塞状态。当leader节点不可用时,阻塞的节点将再次进行选举产生新的 leader节点,从而保证服务的可用性。

为保证通信安全,本文档先生成 x509 证书和私钥,kube-controller-manager在如下两种情况下使用该证书:

  • 与kube-apiserver的安全端口通信;
  • 在安全端口(https,10252) 输出prometheus格式的metrics;

创建kube-controller-manager证书和私钥

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat > kube-controller-manager-csr.json <<EOF
{
"CN": "system:kube-controller-manager",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": [
"127.0.0.1",
"10.105.26.201",
"10.105.26.202",
"10.105.26.203"
],
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "system:kube-controller-manager",
"OU": "System"
}
]
}
EOF
  • hosts列表包含所有kube-controller-manager节点IP;
  • CN和O均为system:kube-controller-manager,kubernetes内置的 ClusterRoleBindings system:kube-controller-manager赋予kube-controller-manager工作所需的权限。

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager

将生成的证书和私钥分发到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager*.pem [email protected]${node_ip}:/etc/kubernetes/pki/
done

创建和分发kubeconfig文件

kube-controller-manager使用kubeconfig文件访问apiserver,该文件提供了 apiserver地址、嵌入的CA证书和kube-controller-manager证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-controller-manager.kubeconfig

kubectl config set-credentials system:kube-controller-manager \
--client-certificate=kube-controller-manager.pem \
--client-key=kube-controller-manager-key.pem \
--embed-certs=true \
--kubeconfig=kube-controller-manager.kubeconfig

kubectl config set-context system:kube-controller-manager \
--cluster=kubernetes \
--user=system:kube-controller-manager \
--kubeconfig=kube-controller-manager.kubeconfig

kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig

分发kubeconfig到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager.kubeconfig [email protected]${node_ip}:/etc/kubernetes/
done

创建kube-controller-manager systemd unit模版文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
cat > kube-controller-manager.service.template <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
WorkingDirectory=${K8S_DIR}/kube-controller-manager
ExecStart=/opt/k8s/bin/kube-controller-manager \\
--profiling \\
--cluster-name=kubernetes \\
--controllers=*,bootstrapsigner,tokencleaner \\
--kube-api-qps=1000 \\
--kube-api-burst=2000 \\
--leader-elect \\
--use-service-account-credentials\\
--concurrent-service-syncs=2 \\
--bind-address=##NODE_IP## \\
--secure-port=10252 \\
--tls-cert-file=/etc/kubernetes/pki/kube-controller-manager.pem \\
--tls-private-key-file=/etc/kubernetes/pki/kube-controller-manager-key.pem \\
--port=0 \\
--authentication-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-allowed-names="" \\
--requestheader-client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--authorization-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem \\
--cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem \\
--experimental-cluster-signing-duration=8760h \\
--horizontal-pod-autoscaler-sync-period=10s \\
--concurrent-deployment-syncs=10 \\
--concurrent-gc-syncs=30 \\
--node-cidr-mask-size=24 \\
--service-cluster-ip-range=${SERVICE_CIDR} \\
--pod-eviction-timeout=6m \\
--terminated-pod-gc-threshold=10000 \\
--root-ca-file=/etc/kubernetes/pki/ca.pem \\
--service-account-private-key-file=/etc/kubernetes/pki/ca-key.pem \\
--kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--logtostderr=true \\
--v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF
  • –port=0:关闭监听非安全端口(http),同时 –address 参数无效,–bind-address 参数有效;
  • –secure-port=10252、–bind-address=0.0.0.0: 在所有网络接口监听 10252 端口的 https /metrics 请求;
  • –kubeconfig:指定 kubeconfig 文件路径,kube-controller-manager 使用它连接和验证 kube-apiserver;
  • –authentication-kubeconfig 和 –authorization-kubeconfig:kube-controller-manager 使用它连接 apiserver,对 client 的请求进行认证和授权。kube-controller-manager 不再使用 –tls-ca-file 对请求 https metrics 的 Client 证书进行校验。如果没有配置这两个 kubeconfig 参数,则 client 连接 kube-controller-manager https 端口的请求会被拒绝(提示权限不足)。
  • –cluster-signing-*-file:签名 TLS Bootstrap 创建的证书;
  • –experimental-cluster-signing-duration:指定 TLS Bootstrap 证书的有效期;
  • –root-ca-file:放置到容器 ServiceAccount 中的 CA 证书,用来对 kube-apiserver 的证书进行校验;
  • –service-account-private-key-file:签名 ServiceAccount 中 Token 的私钥文件,必须和 kube-apiserver 的 –service-account-key-file 指定的公钥文件配对使用;
  • –service-cluster-ip-range :指定 Service Cluster IP 网段,必须和 kube-apiserver 中的同名参数一致;
  • –leader-elect=true:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工作,其它节点为阻塞状态;
  • –controllers=*,bootstrapsigner,tokencleaner:启用的控制器列表,tokencleaner 用于自动清理过期的 Bootstrap token;
  • –horizontal-pod-autoscaler-*:custom metrics 相关参数,支持 autoscaling/v2alpha1;
  • –tls-cert-file、–tls-private-key-file:使用 https 输出 metrics 时使用的 Server 证书和秘钥;
  • –use-service-account-credentials=true: kube-controller-manager 中各 controller 使用 serviceaccount 访问 kube-apiserver;

为各master节点创建和分发kube-controller-mananger systemd unit文件

替换模板文件中的变量,为各节点创建systemd unit文件

1
2
3
4
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-controller-manager.service.template > kube-controller-manager-${NODE_IPS[i]}.service
done

分发到所有 master 节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager-${node_ip}.service [email protected]${node_ip}:/etc/systemd/system/kube-controller-manager.service
done

启动kube-controller-manager服务

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p ${K8S_DIR}/kube-controller-manager"
ssh [email protected]${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager"
done

检查服务运行状态

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "systemctl status kube-controller-manager|grep Active"
done

kube-controller-manager的权限

ClusteRole system:kube-controller-manager的权限很小,只能创建 secret、serviceaccount等资源对象,各controller的权限分散到 ClusterRole system:controller:XXX 中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# kubectl describe clusterrole system:kube-controller-manager
Name: system:kube-controller-manager
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
secrets [] [] [create delete get update]
endpoints [] [] [create get update]
serviceaccounts [] [] [create get update]
events [] [] [create patch update]
tokenreviews.authentication.k8s.io [] [] [create]
subjectaccessreviews.authorization.k8s.io [] [] [create]
configmaps [] [] [get]
namespaces [] [] [get]
*.* [] [] [list watch]

需要在kube-controller-manager的启动参数中添加–use-service-account-credentials=true参数,这样main controller会为各controller创建对应的 ServiceAccount XXX-controller。内置的ClusterRoleBinding system:controller:XXX将赋予各XXX-controller ServiceAccount对应的 ClusterRole system:controller:XXX 权限。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# kubectl get clusterrole|grep controller
system:controller:attachdetach-controller 74m
system:controller:certificate-controller 74m
system:controller:clusterrole-aggregation-controller 74m
system:controller:cronjob-controller 74m
system:controller:daemon-set-controller 74m
system:controller:deployment-controller 74m
system:controller:disruption-controller 74m
system:controller:endpoint-controller 74m
system:controller:expand-controller 74m
system:controller:generic-garbage-collector 74m
system:controller:horizontal-pod-autoscaler 74m
system:controller:job-controller 74m
system:controller:namespace-controller 74m
system:controller:node-controller 74m
system:controller:persistent-volume-binder 74m
system:controller:pod-garbage-collector 74m
system:controller:pv-protection-controller 74m
system:controller:pvc-protection-controller 74m
system:controller:replicaset-controller 74m
system:controller:replication-controller 74m
system:controller:resourcequota-controller 74m
system:controller:route-controller 74m
system:controller:service-account-controller 74m
system:controller:service-controller 74m
system:controller:statefulset-controller 74m
system:controller:ttl-controller 74m
system:kube-controller-manager 74m

以deployment controller为例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# kubectl describe clusterrole system:controller:deployment-controller
Name: system:controller:deployment-controller
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
replicasets.apps [] [] [create delete get list patch update watch]
replicasets.extensions [] [] [create delete get list patch update watch]
events [] [] [create patch update]
pods [] [] [get list update watch]
deployments.apps [] [] [get list update watch]
deployments.extensions [] [] [get list update watch]
deployments.apps/finalizers [] [] [update]
deployments.apps/status [] [] [update]
deployments.extensions/finalizers [] [] [update]
deployments.extensions/status [] [] [update]

查看当前的 leader

1
2
3
4
5
6
7
8
9
10
11
12
# kubectl get endpoints kube-controller-manager --namespace=kube-system  -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-m1_b95d689d-8b43-11e9-8cf3-3e6b5fecb6ef","leaseDurationSeconds":15,"acquireTime":"2019-06-10T05:51:00Z","renewTime":"2019-06-10T06:12:28Z","leaderTransitions":0}'
creationTimestamp: "2019-06-10T05:51:00Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "2262"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: b95fd6a1-8b43-11e9-b331-3e6b5fecb6ef

部署高可用kube-scheduler集群

该集群包含3个节点,启动后将通过竞争选举机制产生一个leader节点,其它节点为阻塞状态。当leader节点不可用后,剩余节点将再次进行选举产生新的 leader节点,从而保证服务的可用性。

为保证通信安全,本文档先生成x509证书和私钥,kube-scheduler在如下两种情况下使用该证书:

  • 与kube-apiserver的安全端口通信;
  • 在安全端口(https,10251) 输出prometheus格式的metrics;

创建kube-scheduler证书和私钥

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat > kube-scheduler-csr.json <<EOF
{
"CN": "system:kube-scheduler",
"hosts": [
"127.0.0.1",
"10.105.26.201",
"10.105.26.202",
"10.105.26.203"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "JiangSu",
"L": "SuZhou",
"O": "system:kube-scheduler",
"OU": "System"
}
]
}
EOF
  • hosts列表包含所有kube-scheduler节点IP;
  • CN和O均为system:kube-scheduler,kubernetes内置的ClusterRoleBindings system:kube-scheduler将赋予kube-scheduler工作所需的权限;

生成证书和私钥

1
2
3
4
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler

将生成的证书和私钥分发到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler*.pem [email protected]${node_ip}:/etc/kubernetes/pki/
done

创建和分发kubeconfig文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-scheduler.kubeconfig

kubectl config set-credentials system:kube-scheduler \
--client-certificate=kube-scheduler.pem \
--client-key=kube-scheduler-key.pem \
--embed-certs=true \
--kubeconfig=kube-scheduler.kubeconfig

kubectl config set-context system:kube-scheduler \
--cluster=kubernetes \
--user=system:kube-scheduler \
--kubeconfig=kube-scheduler.kubeconfig

kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig

分发kubeconfig到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler.kubeconfig [email protected]${node_ip}:/etc/kubernetes/
done

创建kube-scheduler配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat > kube-scheduler.service.template <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
WorkingDirectory=${K8S_DIR}/kube-scheduler
ExecStart=/opt/k8s/bin/kube-scheduler \\
--config=/etc/kubernetes/kube-scheduler.yaml \\
--bind-address=##NODE_IP## \\
--secure-port=10259 \\
--port=0 \\
--tls-cert-file=/etc/kubernetes/pki/kube-scheduler.pem \\
--tls-private-key-file=/etc/kubernetes/pki/kube-scheduler-key.pem \\
--authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-allowed-names="" \\
--requestheader-client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--logtostderr=true \\
--v=2
Restart=always
RestartSec=5
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
EOF
  • –kubeconfig:指定kubeconfig文件路径,kube-scheduler使用它连接和验证kube-apiserver;
  • –leader-elect=true:集群运行模式,启用选举功能;被选为leader的节点负责处理工作,其它节点为阻塞状态;

替换模版文件中的变量

1
2
3
4
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.yaml.template > kube-scheduler-${NODE_IPS[i]}.yaml
done

分发kube-scheduler配置文件到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler-${node_ip}.yaml [email protected]${node_ip}:/etc/kubernetes/kube-scheduler.yaml
done

创建kube-scheduler systemd unit模板文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat > kube-scheduler.service.template <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
WorkingDirectory=${K8S_DIR}/kube-scheduler
ExecStart=/opt/k8s/bin/kube-scheduler \\
--config=/etc/kubernetes/kube-scheduler.yaml \\
--bind-address=##NODE_IP## \\
--secure-port=10259 \\
--port=0 \\
--tls-cert-file=/etc/kubernetes/pki/kube-scheduler.pem \\
--tls-private-key-file=/etc/kubernetes/pki/kube-scheduler-key.pem \\
--authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-allowed-names="" \\
--requestheader-client-ca-file=/etc/kubernetes/pki/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--logtostderr=true \\
--v=2
Restart=always
RestartSec=5
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
EOF

为各节点创建和分发kube-scheduler systemd unit文件

替换模板文件中的变量,为各节点创建 systemd unit 文件

1
2
3
4
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.service.template > kube-scheduler-${NODE_IPS[i]}.service
done

分发systemd unit文件到所有master节点

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler-${node_ip}.service [email protected]${node_ip}:/etc/systemd/system/kube-scheduler.service
done

启动kube-scheduler服务

1
2
3
4
5
6
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "mkdir -p ${K8S_DIR}/kube-scheduler"
ssh [email protected]${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler"
done

检查服务运行状态

1
2
3
4
5
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh [email protected]${node_ip} "systemctl status kube-scheduler|grep Active"
done

查看当前的 leader

1
2
3
4
5
6
7
8
9
10
11
12
# kubectl get endpoints kube-scheduler --namespace=kube-system  -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-m1_d5bb5bf8-8b4a-11e9-acfe-3e6b5fecb6ef","leaseDurationSeconds":15,"acquireTime":"2019-06-10T06:41:55Z","renewTime":"2019-06-10T07:41:36Z","leaderTransitions":0}'
creationTimestamp: "2019-06-10T06:41:55Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "8290"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: d6560c75-8b4a-11e9-b331-3e6b5fecb6ef
-------------本文结束感谢您的阅读-------------