Rebuild a Kubernetes Node Without Downtime

When I built the in-house Kubernetes cluster with Raspberry PIs, I followed the kubeadm instructions and installed Raspberry PI OS on the PIs. It was all good except the RPI OS is 32-bit. Now I want to install a Ubuntu 20.04 Server ARM64 on this PI, below are steps with which I rebuilt the node with Ubuntu and without disrupting the workloads running in my cluster.

First, I didn’t need to shutdown the running node because I’ve got a spare MicroSD card to prepare the Ubuntu image. The instruction for writing the image to the MicroSD card is here. When the card is prepared by the Imager, I kept it in the card reader because I wanted to set the IP address instead of the automatic IP by default. A fixed IP makes more sense if I want to connect to it, right?

To set a static IP in the Ubuntu MicroSD card, open system-boot/network-config file with a text editor and put in something like this:

version: 2
ethernets:
  eth0:
    # Rename the built-in ethernet device to "eth0"
    match:
      driver: bcmgenet smsc95xx lan78xx
    set-name: eth0
    addresses: [192.168.1.82/24]
    gateway4: 192.168.1.1
    nameservers:
      addresses: [192.168.1.1]
    optional: true

Now the new OS is ready. To gracefully shutdown the node, drain it with

kubectl drain node-name
# wait until it finishes
# the pods on this node will be evicted and re-deployed into other nodes
kubectl delete node node-name

Then I powered down the PI and replaced the MicroSD card with the one I just prepared, then I powered it back on. After a minute or 2, I was able to ssh into the node with

# wipe the previous trusted server signature
ssh-keygen -R 192.168.1.82
# login, default password is ubuntu and will be changed upon first login
ssh [email protected]
# install ssh key, login with updated password
ssh-copy-id [email protected]

The node needs to be prepared for kubeadm, I used my good old ansible playbook for this task. The ansible-playbook command looks like

ansible-playbook -i inventory/cluster -l node-name kubeadm.yaml

At the moment I have to install less recent versions of docker and kubeadm to keep it compatible with the existing cluster.

When running kubeadm join command I encountered an error message saying CGROUPS_MEMORY: missing. This can be fixed with this. And one more thing is to create a new token from the master node with command:

kubeadm token create

At last the new node can be joined into the cluster with command:

kubeadm join 192.168.1.80:6443 --token xxx     --discovery-token-ca-cert-hash sha256:xxx

The node will then be bootstrapped in a few minutes. I can tell it’s now ARM64

k get node node-name -o yaml |rg arch
    beta.kubernetes.io/arch: arm64
    kubernetes.io/arch: arm64
...

🙂

Renew Certificates Used in Kubeadm Kubernetes Cluster

It’s been more than a year since I built my Kubernetes cluster with some Raspberry PIs. There was a few times that I need to power down everything to let electricians do their work and the cluster came back online and seemed to be Ok afterwards, given that I didn’t shutdown the PIs properly at all.

Recently I found that I lost contact with the cluster, it looked like:

$ kubectl get node
The connection to the server 192.168.x.x:6443 was refused - did you specify the right host or port?

The first thought came to my mind is the cluster must have got hacked since it’s on auto-pilot for months. But I still could ssh into the master node so it’s not that bad. I saw the error logs from kubelet.service:

Sep 23 15:58:05 kmaster kubelet[1233]: E0923 15:58:05.341773    1233 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2020-09-15 10:40:36 +0000 UTC

That makes perfect sense! The anniversary was just a few days ago and the certificate seems only last a year. Here’s the StackOverflow answer which I found very helpful for this issue.

I tried the following command in the master node and the API server was back to life

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup
$ kubeadm init phase kubeconfig all
$ systemctl restart kubelet.service

I’m not sure if all the new certs will be distributed to nodes automatically but at least the API didn’t complain anymore. I might do a kubeadm upgrade soon.

$ kubectl get node
NAME      STATUS     ROLES    AGE    VERSION
kmaster   NotReady   master   372d   v1.15.3
knode1    NotReady   <none>   372d   v1.15.3
knode2    NotReady   <none>   372d   v1.15.3

EDIT: After the certs are renewed, kubelet service couldn’t authenticate anymore and nodes appeared NotReady. This can be fixed by delete the obsolete kubelet client certificate by

$ ls /var/lib/kubelet/pki -lht
total 28K
-rw------- 1 root root 1.1K Sep 23 19:12 kubelet-client-2020-09-23-19-12-52.pem
lrwxrwxrwx 1 root root   59 Sep 23 19:12 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-09-23-19-12-52.pem
-rw------- 1 root root 2.7K Sep 23 19:12 kubelet-client-2020-09-23-19-12-51.pem
-rw------- 1 root root 1.1K Jun 17 00:56 kubelet-client-2020-06-17-00-56-59.pem
-rw------- 1 root root 1.1K Sep 16  2019 kubelet-client-2019-09-16-20-41-53.pem
-rw------- 1 root root 2.7K Sep 16  2019 kubelet-client-2019-09-16-20-40-40.pem
-rw-r--r-- 1 root root 2.2K Sep 16  2019 kubelet.crt
-rw------- 1 root root 1.7K Sep 16  2019 kubelet.key
$ rm /var/lib/kubelet/pki/kubelet-client-current.pem
$ systemctl restart kubelet.service

🙂

Kubernetes at Home on Raspberry Pi 4, Part 3

Continue from part 2, this is mostly about installing ingress controller. In short, an ingress controller is like a single entry point for all ingress connections into the cluster.

The reason I chose Flannel over other CNIs is that it’s lightweight and not bloated with features. I would like to keep the Pi 4s easy before they are tasked with anything. Same reason I’ll install nginx-ingress-controller to have control with ingress. MetalLB looks a good fit for A Raspberry Pi cluster but I’ll pass at this moment, because this is more like a hobby project, if the load is really high and redundancy is necessary I’ll probably use AWS or GCP which has decent load balancers.

The official nginx-ingress-controller image at quay.io doesn’t seem to support armhf/armv7 architecture, so I built one myself here. To deploy the ingress controller schema but using my own container image, I chose kustomize for the little tweak. (Also kustomize has been integrated into kubectl v1.14+).

First I downloaded the official nginx-ingress-controller schema:

$ wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/mandatory.yaml
$ wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/baremetal/service-nodeport.yaml

Then I use kustomize to replace the container image with my own:

$ cat <<EOF >kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - mandatory.yaml
  - service-nodeport.yaml 
images:
  - name: quay.io/kubernetes-ingress-controller/nginx-ingress-controller
    newName: raynix/nginx-ingress-controller-arm
    newTag: 0.25.1
EOF
# then there should be 3 files in current directory
$ ls
kustomization.yaml  mandatory.yaml  service-nodeport.yaml
# install with kubectl
$ kubectl apply -k .

To see the node port for this ingress controller, do

$ k get --namespace ingress-nginx svc
 NAME            TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
 ingress-nginx   NodePort   10.100.0.246           80:32283/TCP,443:30841/TCP   7d14h

In the upstream nginx or HAProxy box, a static route can be set to route traffic to ingress controller:

$ sudo ip route add 10.100.0.0/24 \
  nexthop via <node1 LAN IP> dev <LAN interface> weight 1 \
  nexthop via <node2 LAN IP> dev <LAN interface> weight 1
$ sudo ip route
...
10.100.0.0/24 
     nexthop via 192.168.1.101 dev enp0xxx weight 1 
     nexthop via 192.168.1.102 dev enp0xxx weight 1  
...

To make the above route permanent, add the following line into /etc/network/interfaces (this is for ubuntu, other distro may defer)

iface enp0s1f1 inet static
...
up ip route add 10.100.0.0/24 nexthop via 192.168.1.81 dev enp0s31f6 weight 1 nexthop via 192.168.1.82 dev enp0s1f1 weight 1

To test if the ingress controller is visible from the upstream box, do

$ curl -H "Host: nonexist.com" http://10.100.0.246
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>openresty/1.15.8.1</center>
</body>
</html>

Now the ingress controller works 🙂

Kubernetes at Home on Raspberry Pi 4, Part 2

Continue from part 1

It’s recommended to change all Pi’s password also run ssh-copy-id [email protected] to enable SSH public key login.

There are lots of steps to prepare before kubeadm is installed, so I made this ansible repository to simplify this repeating process. Please see here. The ansible role will do the following tasks:

  • set host name, update /etc/hosts file
  • enable network bridge
  • disable swap, kubeadm doesn’t like it!
  • set timezone. You may want to change it to yours
  • install docker community edition
  • install kubeadm
  • use iptables-legacy (Reference here)

Just to emphasise at this moment Raspbian has iptables 1.8, a new strain used to be called netfilter tables or nftables. The original iptables is renamed to iptables-legacy. You can use my ansible role to use iptables-legacy or do it with:

# update-alternatives --set iptables /usr/sbin/iptables-legacy 
# update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy 
# update-alternatives --set arptables /usr/sbin/arptables-legacy 
# update-alternatives --set ebtables /usr/sbin/ebtables-legacy

This is absolutely necessary because current CNI implementations only work with the legacy iptables.

Once the ansible playbook finishes successfully, kubeadm is ready for some action to set up the kubernetes master node, aka. control plane

# the following command is to be run in the master node
# I prefer to use flannelas the CNI(container network interface) because it's lightweight comparing to others like weave.net. So the CIDR is to be set as follow
$ sudo kubeadm init --pod-network-cidr 10.244.0.0/16

Then as the kubeadm finishes it will give some instructions to continue. First thing is to copy the admin.conf so kubectl command can authenticate with the control plane. Also save the kubeadm join 192.168.1.80:6443 --token xxx --discovery-token-ca-cert-hash sha256:xxx instruction as it will be needed later

$ sudo cp -i /etc/kubernetes/admin.conf ~/.kube/config
$ kubectl get node
...
$ kubectl get pods
...

The coredns pods will be at pending state, this is expected. After the CNI is installed this will be fixed automatically. Next step is to install a CNI, in my case it’s flannel.

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

In a few minutes the flannel and coredns pods should be in available state. The run the join command saved earlier on other Pi nodes

kubeadm join 192.168.1.80:6443 --token xxx     --discovery-token-ca-cert-hash sha256:xxx

And back to the master node, you should be able to see the new work node in the output

$ kubectl get nodes

TBC