Renew Certificates Used in Kubeadm Kubernetes Cluster

It’s been more than a year since I built my Kubernetes cluster with some Raspberry PIs. There was a few times that I need to power down everything to let electricians do their work and the cluster came back online and seemed to be Ok afterwards, given that I didn’t shutdown the PIs properly at all.

Recently I found that I lost contact with the cluster, it looked like:

$ kubectl get node
The connection to the server 192.168.x.x:6443 was refused - did you specify the right host or port?

The first thought came to my mind is the cluster must have got hacked since it’s on auto-pilot for months. But I still could ssh into the master node so it’s not that bad. I saw the error logs from kubelet.service:

Sep 23 15:58:05 kmaster kubelet[1233]: E0923 15:58:05.341773    1233 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2020-09-15 10:40:36 +0000 UTC

That makes perfect sense! The anniversary was just a few days ago and the certificate seems only last a year. Here’s the StackOverflow answer which I found very helpful for this issue.

I tried the following command in the master node and the API server was back to life

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup
$ kubeadm init phase kubeconfig all
$ systemctl restart kubelet.service

I’m not sure if all the new certs will be distributed to nodes automatically but at least the API didn’t complain anymore. I might do a kubeadm upgrade soon.

$ kubectl get node
kmaster   NotReady   master   372d   v1.15.3
knode1    NotReady   <none>   372d   v1.15.3
knode2    NotReady   <none>   372d   v1.15.3

EDIT: After the certs are renewed, kubelet service couldn’t authenticate anymore and nodes appeared NotReady. This can be fixed by delete the obsolete kubelet client certificate by

$ ls /var/lib/kubelet/pki -lht
total 28K
-rw------- 1 root root 1.1K Sep 23 19:12 kubelet-client-2020-09-23-19-12-52.pem
lrwxrwxrwx 1 root root   59 Sep 23 19:12 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-09-23-19-12-52.pem
-rw------- 1 root root 2.7K Sep 23 19:12 kubelet-client-2020-09-23-19-12-51.pem
-rw------- 1 root root 1.1K Jun 17 00:56 kubelet-client-2020-06-17-00-56-59.pem
-rw------- 1 root root 1.1K Sep 16  2019 kubelet-client-2019-09-16-20-41-53.pem
-rw------- 1 root root 2.7K Sep 16  2019 kubelet-client-2019-09-16-20-40-40.pem
-rw-r--r-- 1 root root 2.2K Sep 16  2019 kubelet.crt
-rw------- 1 root root 1.7K Sep 16  2019 kubelet.key
$ rm /var/lib/kubelet/pki/kubelet-client-current.pem
$ systemctl restart kubelet.service


Use Fluentd and Elasticsearch to Analyse Squid Proxy Traffic

TL;DR This is a quick guide to set up Fluentd + Elasticsearch integration to analyse Squid Proxy traffic. In the example below Fluentd td-agent is installed in the same host as Squid Proxy and Elasticsearch is installed in the other host. The OS is Ubuntu 20.04.

Useful links:
– Fluentd installation:
– Elasticsearch installation:

The logs of Squid need to be accessible by td-agent, it can be done by adding td-agent user to the proxy group:

$ sudo usermod --groups proxy -a td-agent

The configuration for td-agent looks like

  @type tail
  @id squid_tail
    @type regexp
    expression /^(?<timestamp>[0-9]+)[\.0-9]* +(?<elapsed>[0-9]+) (?<userIP>[0-9\.]+) (?<action>[A-Z_]+)\/(?<statusCode>[0-9]+) (?<size>[0-9]+) (?<method>[A-Z]+) (?<URL>[^ ]+) (?<rfc931>[^ ]+) (?<peerStatus>[^ ]+)/(?<peerIP>[^ ]+) (?<mime>[^ ]+)/
    time_key timestamp
    time_format %s
  path /var/log/squid/access.log
  tag squid.access

<match squid.access>
  @type elasticsearch
  host <elasticsearch server IP>
  port 9200
  logstash_format true
  flush_interval 10s
  index_name fluentd
  type_name fluentd
  include_tag_key true
  user elastic
  password <elsticsearch password>

The key is to get the regex expression to fit the Squid access log, which looks like

1598101487.920 240256 TCP_TUNNEL/200 1562 CONNECT - HIER_DIRECT/ -

Then I can use the fields defined in the regex, such as userIP or URL in Elasticsearch for queries.


Use Variable in Kustomize

Variables in Kustomize are handy helpers from time to time, with these variables I can link some settings together which should share the same value all the time. Without variable I probably need to use some template engine like Jinja2 to do the same trick.

Some examples here.

In my case, there’s a bug in kustomize as of now(3.6.1) where configMap object names don’t get properly suffixed in a patch file. The issue is here. I can however use variable to overcome this bug. Imagine in a scenario I have a configMap in a base template and it will be referenced in a patch file:

# common/kustomization.yaml
kind: Kustomization

  - name: common
      - TEST=YES

# test/kustomization.yaml
kind: Kustomization
namespace: test
  - ../base
  - ../common
nameSuffix: -raynix
  - patch.yaml

# test/patch.yaml
apiVersion: apps/v1
kind: Deployment
  name: test
        - name: common
            name: common
            # this should be linked to the configMap in common/kustomization.yaml but it won't be updated with a hash and suffix.

Using variable can get around this bug. Please see the following example:

# common/kustomization.yaml
kind: Kustomization
  - configuration.yaml
  - name: common
      - TEST=YES
  - name: COMMON
      apiVersion: v1
      kind: ConfigMap
      name: common
      # this can be omitted as is the default fieldPath 

# test/kustomization.yaml unchanged

# test/patch.yaml
apiVersion: apps/v1
kind: Deployment
  name: test
        - name: common
            name: $(COMMON)
            # now $(COMMON) will be updated with whatever the real configmap name is

Problem solved 🙂

5G + Public IP with OpenVPN

I’ve done a proof of concept with SSH tunneling to add a public IP to my 5G home broadband connection, it works for my garage-hosted blogs but it’s not a complete solution. Since I still have free credit in my personal Google Cloud account, I decided to make an improvement with OpenVPN. The diagram looks like:

       [iptables DNAT]
      [OpenVPN tunnel]
[local server tun0 interface:]

Following an outstanding tutorial on DigitalOcean I set up an OpenVPN server on Debian 10 running in a Google Cloud Compute instance. There’s a few more thing to do for my case.

First I needed to add port forwarding from the public interface of the OpenVPN server to home server’s tunnel interface. Here’s my ufw configuration file:

# this is /etc/ufw/before.rules
# NAT table rules
# port forwarding to home server
-A PREROUTING -i eth0 -p tcp -d <public ip> --dport 80 -j DNAT --to

# Allow traffic from OpenVPN client to eth0, ie. internet access

Make sure to restart ufw after this.

Then in my home server, the OpenVPN client can be configured to run as a service:

# this is /etc/systemd/system/vpnclient.service
Description=Setup an openvpn tunnel to kite server

ExecStart=/usr/sbin/openvpn --config /etc/openvpn/client1.conf


To enable it and start immediately:

sudo systemctl daemon-reload
sudo systemctl enable vpnclient
sudo systemctl start vpnclient

Also I need my home server to have a fixed IP for its tun0 network interface, so the nginx server can proxy traffic to this IP reliably. I followed this guide, except it suggested to do client-config-dir on both server and client sides but I only did on the server side and it worked for me:

# this is /etc/openvpn/server.conf
# uncomment the following line
client-config-dir ccd

# this is /etc/openvpn/ccd/client1

After this the OpenVPN server on the VM needs to be restarted:

sudo systemctl restart [email protected]

Reload the nginx server and it should be working. I tested it with curl -H "Host:" 35.197.x.x and the request hit my home server.