Renew Certificates Used in Kubeadm Kubernetes Cluster

It’s been more than a year since I built my Kubernetes cluster with some Raspberry PIs. There was a few times that I need to power down everything to let electricians do their work and the cluster came back online and seemed to be Ok afterwards, given that I didn’t shutdown the PIs properly at all.

Recently I found that I lost contact with the cluster, it looked like:

$ kubectl get node
The connection to the server 192.168.x.x:6443 was refused - did you specify the right host or port?

The first thought came to my mind is the cluster must have got hacked since it’s on auto-pilot for months. But I still could ssh into the master node so it’s not that bad. I saw the error logs from kubelet.service:

Sep 23 15:58:05 kmaster kubelet[1233]: E0923 15:58:05.341773    1233 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2020-09-15 10:40:36 +0000 UTC

That makes perfect sense! The anniversary was just a few days ago and the certificate seems only last a year. Here’s the StackOverflow answer which I found very helpful for this issue.

I tried the following command in the master node and the API server was back to life

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup
$ kubeadm init phase kubeconfig all
$ systemctl restart kubelet.service

I’m not sure if all the new certs will be distributed to nodes automatically but at least the API didn’t complain anymore. I might do a kubeadm upgrade soon.

$ kubectl get node
NAME      STATUS     ROLES    AGE    VERSION
kmaster   NotReady   master   372d   v1.15.3
knode1    NotReady   <none>   372d   v1.15.3
knode2    NotReady   <none>   372d   v1.15.3

EDIT: After the certs are renewed, kubelet service couldn’t authenticate anymore and nodes appeared NotReady. This can be fixed by delete the obsolete kubelet client certificate by

$ ls /var/lib/kubelet/pki -lht
total 28K
-rw------- 1 root root 1.1K Sep 23 19:12 kubelet-client-2020-09-23-19-12-52.pem
lrwxrwxrwx 1 root root   59 Sep 23 19:12 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-09-23-19-12-52.pem
-rw------- 1 root root 2.7K Sep 23 19:12 kubelet-client-2020-09-23-19-12-51.pem
-rw------- 1 root root 1.1K Jun 17 00:56 kubelet-client-2020-06-17-00-56-59.pem
-rw------- 1 root root 1.1K Sep 16  2019 kubelet-client-2019-09-16-20-41-53.pem
-rw------- 1 root root 2.7K Sep 16  2019 kubelet-client-2019-09-16-20-40-40.pem
-rw-r--r-- 1 root root 2.2K Sep 16  2019 kubelet.crt
-rw------- 1 root root 1.7K Sep 16  2019 kubelet.key
$ rm /var/lib/kubelet/pki/kubelet-client-current.pem
$ systemctl restart kubelet.service

🙂

Use Variable in Kustomize

Variables in Kustomize are handy helpers from time to time, with these variables I can link some settings together which should share the same value all the time. Without variable I probably need to use some template engine like Jinja2 to do the same trick.

Some examples here.

In my case, there’s a bug in kustomize as of now(3.6.1) where configMap object names don’t get properly suffixed in a patch file. The issue is here. I can however use variable to overcome this bug. Imagine in a scenario I have a configMap in a base template and it will be referenced in a patch file:

# common/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
  - name: common
    literals:
      - TEST=YES

# test/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: test
bases:
  - ../base
  - ../common
nameSuffix: -raynix
patchesStrategicMerge:
  - patch.yaml

# test/patch.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  templates:
    spec:
      volumes:
        - name: common
          configMap:
            name: common
            # this should be linked to the configMap in common/kustomization.yaml but it won't be updated with a hash and suffix.

Using variable can get around this bug. Please see the following example:

# common/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configurations:
  - configuration.yaml
configMapGenerator:
  - name: common
    literals:
      - TEST=YES
vars:
  - name: COMMON
    objref:
      apiVersion: v1
      kind: ConfigMap
      name: common
    fieldref:
      # this can be omitted as metadata.name is the default fieldPath 
      fieldPath: metadata.name

# test/kustomization.yaml unchanged

# test/patch.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  templates:
    spec:
      volumes:
        - name: common
          configMap:
            name: $(COMMON)
            # now $(COMMON) will be updated with whatever the real configmap name is

Problem solved 🙂

5G + Public IP with OpenVPN

I’ve done a proof of concept with SSH tunneling to add a public IP to my 5G home broadband connection, it works for my garage-hosted blogs but it’s not a complete solution. Since I still have free credit in my personal Google Cloud account, I decided to make an improvement with OpenVPN. The diagram looks like:

        [CloudFlare] 
             |
            HTTP
             |
     [VM:35.197.x.x:80]
             |
       [iptables DNAT]
             |
      [OpenVPN tunnel]
             |
[local server tun0 interface: 10.8.0.51:80]

Following an outstanding tutorial on DigitalOcean I set up an OpenVPN server on Debian 10 running in a Google Cloud Compute instance. There’s a few more thing to do for my case.

First I needed to add port forwarding from the public interface of the OpenVPN server to home server’s tunnel interface. Here’s my ufw configuration file:

# this is /etc/ufw/before.rules
# START OPENVPN RULES
# NAT table rules
*nat
:PREROUTING ACCEPT [0:0]
# port forwarding to home server
-A PREROUTING -i eth0 -p tcp -d <public ip> --dport 80 -j DNAT --to 10.8.0.51:80

:POSTROUTING ACCEPT [0:0] 
# Allow traffic from OpenVPN client to eth0, ie. internet access
-A POSTROUTING -s 10.8.0.0/8 -o eth0 -j MASQUERADE
COMMIT
# END OPENVPN RULES

Make sure to restart ufw after this.

Then in my home server, the OpenVPN client can be configured to run as a service:

# this is /etc/systemd/system/vpnclient.service
[Unit]
Description=Setup an openvpn tunnel to kite server
After=network.target

[Service]
ExecStart=/usr/sbin/openvpn --config /etc/openvpn/client1.conf
RestartSec=5
Restart=always

[Install]
WantedBy=multi-user.target

To enable it and start immediately:

sudo systemctl daemon-reload
sudo systemctl enable vpnclient
sudo systemctl start vpnclient

Also I need my home server to have a fixed IP for its tun0 network interface, so the nginx server can proxy traffic to this IP reliably. I followed this guide, except it suggested to do client-config-dir on both server and client sides but I only did on the server side and it worked for me:

# this is /etc/openvpn/server.conf
# uncomment the following line
client-config-dir ccd

# this is /etc/openvpn/ccd/client1
ifconfig-push 10.8.0.51 255.255.255.255

After this the OpenVPN server on the VM needs to be restarted:

sudo systemctl restart [email protected]

Reload the nginx server and it should be working. I tested it with curl -H "Host: raynix.info" 35.197.x.x and the request hit my home server.

🙂

5G is Fast but There’s No Public IP

I’m super happy that I can finally have a broadband that does have a broad bandwidth. However like all other cellular services the 5G gateway has a private IP as its external IP, ie. everything I got is behind huge NAT servers of Optus and they will not open any port just for me.

The NBN wasn’t as fast but there was a public IP assigned to the router… Should I go back to use NBN because of this reason? I’ve done countless internet/cloud solutions, can I do one for myself? I remember when I worked on a private network behind NAT gateways I could do SSH tunneling and expose a port in the private network to outside for 3rd party partners. All I need is a virtual machine in the cloud that has a public IP. It will look like this:

        [CloudFlare] 
             |
            HTTP
             |
     [VM:35.197.x.x:80]
             |
       [SSH Tunnel]
             |
[local server:192.168.1.x:80]

I need to create a SSH tunneling service between my home server and the cloud instance, then point CloudFlare DNS to the public IP of the cloud instance, that’s it! But hang on, I can’t bind to port 80 using a non-privileged user. So there’s a local port forwarding in the cloud instance to handle this:

        [CloudFlare] 
             |
            HTTP
             |
     [VM:35.197.x.x:80]
             |
         [VM:nginx]
             |
    [VM:127.0.0.1:8080]
             |
        [SSH Tunnel]
             |
[local server:192.168.1.x:80]

Ok let’s do it. First I created a cloud instance in Google Cloud, because it gave me $400 free credit last year and I haven’t used much yet. I opened port 80 for the instance, added my SSH public key and installed an nginx server forwarding traffic from port 80 to local port 8080. Here’s the simple nginx configuration:

# This file can be saved as /etc/nginx/site-enabled/proxy
# in the cloud instance
server{
        listen 80;
        server_name raynix.info;
	client_max_body_size 100M;

        location / {
		proxy_pass http://localhost:8080;
		proxy_set_header Host              $host;
		proxy_set_header X-Real-IP         $remote_addr;
		proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto https;
		proxy_set_header X-Forwarded-Host  $host;
		proxy_set_header X-Forwarded-Port  $server_port;
        }
}

Don’t forget to reload nginx. Done! The next step is to create an SSH tunneling service in my home server. My home server is running Ubuntu so the service is defined with systemd syntax:

# The file can be saved as /etc/systemd/system/mytunnel.service
# in my home server
# replace the id_rsa with yours of course
[Unit]
Description=Setup a secure tunnel to kite server
After=network.target

[Service]
ExecStart=/usr/bin/ssh -NT -i /home/ray/.ssh/id_rsa -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -R 8080:localhost:80 [email protected]

# Restart every >2 seconds to avoid StartLimitInterval failure
RestartSec=5
Restart=always

[Install]
WantedBy=multi-user.target

Then use the following commands to start and check the service:

$ systemctl daemon-reload
$ systemctl start mytunnel
$ systemctl status -l mytunnel

The status of the tunnel can also be verified in the cloud instance by:

$ sudo netstat -tlnp
...
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      1460/sshd: ray      

By now all the dots have been connected. I can test the round trip on my laptop with:

$ curl http://35.197.x.x -H 'Host: raynix.info'
# this should print out the HTML of my blog's homepage.

The last step is to update CloudFlare DNS to send traffic to the new cloud instance. Did it work? You’re looking at the result right now 🙂