[ Solved ] kube-apiserver Kept Crashing


I planned to upgrade my home-lab Kubernetes cluster from 1.32 to 1.35, using the same shortcut I used last time. Unfortunately when I took a look at the cluster which I didn’t touch for a while, I couldn’t connect to it anymore.

k get nodes
The connection to the server 192.168.1.93:6443 was refused - did you specify the right host or port?

It’s not convenient at all – I can’t use kubectl to troubleshoot the cluster issue because the API server is not working for me. I’ll have to dive to a lower level – the Linux container level – to see what’s going on. The kube-apiserver pod is always running on the master node, time to warm-up my ssh skills!

ssh kube-master
...
sudo -i
...
# list all linux containers
crictl ps -a
...
aba9a230c0fa2       1c20c8797e486       24 seconds ago      Exited              kube-apiserver ...
...
# see the logs of the container
crictl logs $(crictl ps -a --name kube-apiserver -q | head -n 1)
...
F0511 01:56:17.009965       1 instance.go:226] Error creating leases: error creating storage factory: open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory
# here we go! so the cert is missing for some reason
# here's the kubeadm command to double check certs
kubeadm certs check-expiration
...
!MISSING! apiserver-etcd-client
...
# add the missing cert
kubeadm init phase certs apiserver-etcd-client
[certs] Generating "apiserver-etcd-client" certificate and key
# restart all containers just in case
systemctl restart kubelet
# check containers again
crictl ps -a
...
f8c8c3465461b       1c20c8797e486       3 seconds ago       Running             kube-apiserver ...
# oh yeah!

It turned out that 1 missing cert can cut down all communications with a Kubernetes cluster 🙂 Luckily all workloads weren’t affected in worker nodes.