Rebuild a Kubernetes Node Without Downtime

When I built the in-house Kubernetes cluster with Raspberry PIs, I followed the kubeadm instructions and installed Raspberry PI OS on the PIs. It was all good except the RPI OS is 32-bit. Now I want to install a Ubuntu 20.04 Server ARM64 on this PI, below are steps with which I rebuilt the node with Ubuntu and without disrupting the workloads running in my cluster.

First, I didn’t need to shutdown the running node because I’ve got a spare MicroSD card to prepare the Ubuntu image. The instruction for writing the image to the MicroSD card is here. When the card is prepared by the Imager, I kept it in the card reader because I wanted to set the IP address instead of the automatic IP by default. A fixed IP makes more sense if I want to connect to it, right?

To set a static IP in the Ubuntu MicroSD card, open system-boot/network-config file with a text editor and put in something like this:

version: 2
ethernets:
  eth0:
    # Rename the built-in ethernet device to "eth0"
    match:
      driver: bcmgenet smsc95xx lan78xx
    set-name: eth0
    addresses: [192.168.1.82/24]
    gateway4: 192.168.1.1
    nameservers:
      addresses: [192.168.1.1]
    optional: true

Now the new OS is ready. To gracefully shutdown the node, drain it with

kubectl drain node-name
# wait until it finishes
# the pods on this node will be evicted and re-deployed into other nodes
kubectl delete node node-name

Then I powered down the PI and replaced the MicroSD card with the one I just prepared, then I powered it back on. After a minute or 2, I was able to ssh into the node with

# wipe the previous trusted server signature
ssh-keygen -R 192.168.1.82
# login, default password is ubuntu and will be changed upon first login
ssh [email protected]
# install ssh key, login with updated password
ssh-copy-id [email protected]

The node needs to be prepared for kubeadm, I used my good old ansible playbook for this task. The ansible-playbook command looks like

ansible-playbook -i inventory/cluster -l node-name kubeadm.yaml

At the moment I have to install less recent versions of docker and kubeadm to keep it compatible with the existing cluster.

When running kubeadm join command I encountered an error message saying CGROUPS_MEMORY: missing. This can be fixed with this. And one more thing is to create a new token from the master node with command:

kubeadm token create

At last the new node can be joined into the cluster with command:

kubeadm join 192.168.1.80:6443 --token xxx     --discovery-token-ca-cert-hash sha256:xxx

The node will then be bootstrapped in a few minutes. I can tell it’s now ARM64

k get node node-name -o yaml |rg arch
    beta.kubernetes.io/arch: arm64
    kubernetes.io/arch: arm64
...

🙂

Hybrid Kubernetes Cluster (X86 + ARM)

My old ASUS 15″ laptop bought in 2014. It has a sub-woofer!

The one in the picture was my old laptop, then my daughter’s for a few years. Now she got a nice new 2-in-1 ultra book the school asked us parents to buy, this clunky one was gathering dust on shelves. I tried to sell it but got no one’s attention despite it has got i7 CPU and 16GB of memory.

So I was thinking, this has same amount of memory as 4 x Raspberry PI 4, but I probably won’t be able to sell it to pay for the PIs. Why not just use it as a glorified Raspberry PI? I measured its power consumption and to my surprise, this one with gen 4 i7 only asks for 10W when idle and screen off, not bad at all. In comparison 4 x PI 4 probably need 20W to stay up.

Let’s do it then!

I re-installed the OS with Ubuntu Server 20.04 LTS and prepared it for kubeadm to run with my ansible playbooks here. Since I’ve updated my playbook to let it handle both Raspbian on ARM and Ubuntu on X86_64 it was fairly easy to get the laptop(calling it knode3 afterwards) ready.

I haven’t locked down versions in my playbook so the installed docker and kubeadm are vastly newer than the ones in my existing Raspberry PI cluster so there will be some compatibility issues if I don’t match them. I used the following commands to downgrade docker and kubeadm:

apt remove docker-ce --purge
apt install docker-ce=5:19.03.9~3-0~ubuntu-focal
apt install kubeadm=1.18.13-00

The kubeadm join command I ran earlier on nodes didn’t work anymore and it complained about the token. Of course the token has expired after a year or so. Here’s command to issue a new token from the master node

kubeadm token create
xxxxxx.xxx...

Grab the new token and replace the one in the join command

kubeadm join <master IP>:6443 --token <new token xxx> --discovery-token-ca-cert-hash sha256:<hash didn't change>

For debugging purpose I ran journalctl -f in the other tab of the terminal to see the output. When the join command finished, I ran kubectl get nodes in my local terminal session to verify the result

kubectl get node
NAME      STATUS   ROLES    AGE    VERSION
kmaster   Ready    master   89d    v1.18.8
knode1    Ready    <none>   89d    v1.18.8
knode2    Ready    <none>   89d    v1.18.8
knode3    Ready    <none>   3m     v1.20.1

The kubernetes version is a bit newer, maybe I will upgrade the old nodes quickly. Now I have a node which has 16GB of memory 🙂

PS. to keep the laptop running when the lid is closed I used this tweak.

UPDATE: Now I’ve put another 3 old laptops into the cluster!

Front to back: my asus laptop, my friend’s thinkpad, my thinkpad, and my friend’s dell

Renew Certificates Used in Kubeadm Kubernetes Cluster

It’s been more than a year since I built my Kubernetes cluster with some Raspberry PIs. There was a few times that I need to power down everything to let electricians do their work and the cluster came back online and seemed to be Ok afterwards, given that I didn’t shutdown the PIs properly at all.

Recently I found that I lost contact with the cluster, it looked like:

$ kubectl get node
The connection to the server 192.168.x.x:6443 was refused - did you specify the right host or port?

The first thought came to my mind is the cluster must have got hacked since it’s on auto-pilot for months. But I still could ssh into the master node so it’s not that bad. I saw the error logs from kubelet.service:

Sep 23 15:58:05 kmaster kubelet[1233]: E0923 15:58:05.341773    1233 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2020-09-15 10:40:36 +0000 UTC

That makes perfect sense! The anniversary was just a few days ago and the certificate seems only last a year. Here’s the StackOverflow answer which I found very helpful for this issue.

I tried the following command in the master node and the API server was back to life

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup
$ kubeadm init phase kubeconfig all
$ systemctl restart kubelet.service

I’m not sure if all the new certs will be distributed to nodes automatically but at least the API didn’t complain anymore. I might do a kubeadm upgrade soon.

$ kubectl get node
NAME      STATUS     ROLES    AGE    VERSION
kmaster   NotReady   master   372d   v1.15.3
knode1    NotReady   <none>   372d   v1.15.3
knode2    NotReady   <none>   372d   v1.15.3

EDIT: After the certs are renewed, kubelet service couldn’t authenticate anymore and nodes appeared NotReady. This can be fixed by delete the obsolete kubelet client certificate by

$ ls /var/lib/kubelet/pki -lht
total 28K
-rw------- 1 root root 1.1K Sep 23 19:12 kubelet-client-2020-09-23-19-12-52.pem
lrwxrwxrwx 1 root root   59 Sep 23 19:12 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-09-23-19-12-52.pem
-rw------- 1 root root 2.7K Sep 23 19:12 kubelet-client-2020-09-23-19-12-51.pem
-rw------- 1 root root 1.1K Jun 17 00:56 kubelet-client-2020-06-17-00-56-59.pem
-rw------- 1 root root 1.1K Sep 16  2019 kubelet-client-2019-09-16-20-41-53.pem
-rw------- 1 root root 2.7K Sep 16  2019 kubelet-client-2019-09-16-20-40-40.pem
-rw-r--r-- 1 root root 2.2K Sep 16  2019 kubelet.crt
-rw------- 1 root root 1.7K Sep 16  2019 kubelet.key
$ rm /var/lib/kubelet/pki/kubelet-client-current.pem
$ systemctl restart kubelet.service

🙂