It’s been more than a year since I built my Kubernetes cluster with some Raspberry PIs. There was a few times that I need to power down everything to let electricians do their work and the cluster came back online and seemed to be Ok afterwards, given that I didn’t shutdown the PIs properly at all.
Recently I found that I lost contact with the cluster, it looked like:
$ kubectl get node The connection to the server 192.168.x.x:6443 was refused - did you specify the right host or port?
The first thought came to my mind is the cluster must have got hacked since it’s on auto-pilot for months. But I still could ssh
into the master node so it’s not that bad. I saw the error logs from kubelet.service
:
Sep 23 15:58:05 kmaster kubelet[1233]: E0923 15:58:05.341773 1233 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2020-09-15 10:40:36 +0000 UTC
That makes perfect sense! The anniversary was just a few days ago and the certificate seems only last a year. Here’s the StackOverflow answer which I found very helpful for this issue.
I tried the following command in the master node and the API server was back to life
$ cd /etc/kubernetes/pki/ $ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup $ kubeadm init phase certs all --apiserver-advertise-address <master-IP> $ cd /etc/kubernetes/ $ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup $ kubeadm init phase kubeconfig all $ systemctl restart kubelet.service
I’m not sure if all the new certs will be distributed to nodes automatically but at least the API didn’t complain anymore. I might do a kubeadm upgrade soon.
$ kubectl get node NAME STATUS ROLES AGE VERSION kmaster NotReady master 372d v1.15.3 knode1 NotReady <none> 372d v1.15.3 knode2 NotReady <none> 372d v1.15.3
EDIT: After the certs are renewed, kubelet service couldn’t authenticate anymore and nodes appeared NotReady
. This can be fixed by delete the obsolete kubelet client certificate by
$ ls /var/lib/kubelet/pki -lht total 28K -rw------- 1 root root 1.1K Sep 23 19:12 kubelet-client-2020-09-23-19-12-52.pem lrwxrwxrwx 1 root root 59 Sep 23 19:12 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-09-23-19-12-52.pem -rw------- 1 root root 2.7K Sep 23 19:12 kubelet-client-2020-09-23-19-12-51.pem -rw------- 1 root root 1.1K Jun 17 00:56 kubelet-client-2020-06-17-00-56-59.pem -rw------- 1 root root 1.1K Sep 16 2019 kubelet-client-2019-09-16-20-41-53.pem -rw------- 1 root root 2.7K Sep 16 2019 kubelet-client-2019-09-16-20-40-40.pem -rw-r--r-- 1 root root 2.2K Sep 16 2019 kubelet.crt -rw------- 1 root root 1.7K Sep 16 2019 kubelet.key $ rm /var/lib/kubelet/pki/kubelet-client-current.pem $ systemctl restart kubelet.service
🙂