Renew Certificates Used in Kubeadm Kubernetes Cluster

It’s been more than a year since I built my Kubernetes cluster with some Raspberry PIs. There was a few times that I need to power down everything to let electricians do their work and the cluster came back online and seemed to be Ok afterwards, given that I didn’t shutdown the PIs properly at all.

Recently I found that I lost contact with the cluster, it looked like:

$ kubectl get node
The connection to the server 192.168.x.x:6443 was refused - did you specify the right host or port?

The first thought came to my mind is the cluster must have got hacked since it’s on auto-pilot for months. But I still could ssh into the master node so it’s not that bad. I saw the error logs from kubelet.service:

Sep 23 15:58:05 kmaster kubelet[1233]: E0923 15:58:05.341773    1233 bootstrap.go:263] Part of the existing bootstrap client certificate is expired: 2020-09-15 10:40:36 +0000 UTC

That makes perfect sense! The anniversary was just a few days ago and the certificate seems only last a year. Here’s the StackOverflow answer which I found very helpful for this issue.

I tried the following command in the master node and the API server was back to life

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} /tmp/backup
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} /tmp/backup
$ kubeadm init phase kubeconfig all
$ systemctl restart kubelet.service

I’m not sure if all the new certs will be distributed to nodes automatically but at least the API didn’t complain anymore. I might do a kubeadm upgrade soon.

$ kubectl get node
NAME      STATUS     ROLES    AGE    VERSION
kmaster   NotReady   master   372d   v1.15.3
knode1    NotReady   <none>   372d   v1.15.3
knode2    NotReady   <none>   372d   v1.15.3

EDIT: After the certs are renewed, kubelet service couldn’t authenticate anymore and nodes appeared NotReady. This can be fixed by delete the obsolete kubelet client certificate by

$ ls /var/lib/kubelet/pki -lht
total 28K
-rw------- 1 root root 1.1K Sep 23 19:12 kubelet-client-2020-09-23-19-12-52.pem
lrwxrwxrwx 1 root root   59 Sep 23 19:12 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2020-09-23-19-12-52.pem
-rw------- 1 root root 2.7K Sep 23 19:12 kubelet-client-2020-09-23-19-12-51.pem
-rw------- 1 root root 1.1K Jun 17 00:56 kubelet-client-2020-06-17-00-56-59.pem
-rw------- 1 root root 1.1K Sep 16  2019 kubelet-client-2019-09-16-20-41-53.pem
-rw------- 1 root root 2.7K Sep 16  2019 kubelet-client-2019-09-16-20-40-40.pem
-rw-r--r-- 1 root root 2.2K Sep 16  2019 kubelet.crt
-rw------- 1 root root 1.7K Sep 16  2019 kubelet.key
$ rm /var/lib/kubelet/pki/kubelet-client-current.pem
$ systemctl restart kubelet.service

🙂

Use Fluentd and Elasticsearch to Analyse Squid Proxy Traffic

TL;DR This is a quick guide to set up Fluentd + Elasticsearch integration to analyse Squid Proxy traffic. In the example below Fluentd td-agent is installed in the same host as Squid Proxy and Elasticsearch is installed in the other host. The OS is Ubuntu 20.04.

Useful links:
– Fluentd installation: https://docs.fluentd.org/installation/install-by-deb
– Elasticsearch installation: https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html

The logs of Squid need to be accessible by td-agent, it can be done by adding td-agent user to the proxy group:

$ sudo usermod --groups proxy -a td-agent

The configuration for td-agent looks like

<source>
  @type tail
  @id squid_tail
  <parse>
    @type regexp
    expression /^(?<timestamp>[0-9]+)[\.0-9]* +(?<elapsed>[0-9]+) (?<userIP>[0-9\.]+) (?<action>[A-Z_]+)\/(?<statusCode>[0-9]+) (?<size>[0-9]+) (?<method>[A-Z]+) (?<URL>[^ ]+) (?<rfc931>[^ ]+) (?<peerStatus>[^ ]+)/(?<peerIP>[^ ]+) (?<mime>[^ ]+)/
    time_key timestamp
    time_format %s
  </parse>
  path /var/log/squid/access.log
  tag squid.access
</source>

<match squid.access>
  @type elasticsearch
  host <elasticsearch server IP>
  port 9200
  logstash_format true
  flush_interval 10s
  index_name fluentd
  type_name fluentd
  include_tag_key true
  user elastic
  password <elsticsearch password>
</match>

The key is to get the regex expression to fit the Squid access log, which looks like

1598101487.920 240256 192.168.10.111 TCP_TUNNEL/200 1562 CONNECT www.google.com.au:443 - HIER_DIRECT/142.250.66.163 -

Then I can use the fields defined in the regex, such as userIP or URL in Elasticsearch for queries.

🙂

Use Variable in Kustomize

Variables in Kustomize are handy helpers from time to time, with these variables I can link some settings together which should share the same value all the time. Without variable I probably need to use some template engine like Jinja2 to do the same trick.

Some examples here.

In my case, there’s a bug in kustomize as of now(3.6.1) where configMap object names don’t get properly suffixed in a patch file. The issue is here. I can however use variable to overcome this bug. Imagine in a scenario I have a configMap in a base template and it will be referenced in a patch file:

# common/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
  - name: common
    literals:
      - TEST=YES

# test/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: test
bases:
  - ../base
  - ../common
nameSuffix: -raynix
patchesStrategicMerge:
  - patch.yaml

# test/patch.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  templates:
    spec:
      volumes:
        - name: common
          configMap:
            name: common
            # this should be linked to the configMap in common/kustomization.yaml but it won't be updated with a hash and suffix.

Using variable can get around this bug. Please see the following example:

# common/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configurations:
  - configuration.yaml
configMapGenerator:
  - name: common
    literals:
      - TEST=YES
vars:
  - name: COMMON
    objref:
      apiVersion: v1
      kind: ConfigMap
      name: common
    fieldref:
      # this can be omitted as metadata.name is the default fieldPath 
      fieldPath: metadata.name

# test/kustomization.yaml unchanged

# test/patch.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  templates:
    spec:
      volumes:
        - name: common
          configMap:
            name: $(COMMON)
            # now $(COMMON) will be updated with whatever the real configmap name is

Problem solved 🙂

5G + Public IP with OpenVPN

I’ve done a proof of concept with SSH tunneling to add a public IP to my 5G home broadband connection, it works for my garage-hosted blogs but it’s not a complete solution. Since I still have free credit in my personal Google Cloud account, I decided to make an improvement with OpenVPN. The diagram looks like:

        [CloudFlare] 
             |
            HTTP
             |
     [VM:35.197.x.x:80]
             |
       [iptables DNAT]
             |
      [OpenVPN tunnel]
             |
[local server tun0 interface: 10.8.0.51:80]

Following an outstanding tutorial on DigitalOcean I set up an OpenVPN server on Debian 10 running in a Google Cloud Compute instance. There’s a few more thing to do for my case.

First I needed to add port forwarding from the public interface of the OpenVPN server to home server’s tunnel interface. Here’s my ufw configuration file:

# this is /etc/ufw/before.rules
# START OPENVPN RULES
# NAT table rules
*nat
:PREROUTING ACCEPT [0:0]
# port forwarding to home server
-A PREROUTING -i eth0 -p tcp -d <public ip> --dport 80 -j DNAT --to 10.8.0.51:80

:POSTROUTING ACCEPT [0:0] 
# Allow traffic from OpenVPN client to eth0, ie. internet access
-A POSTROUTING -s 10.8.0.0/8 -o eth0 -j MASQUERADE
COMMIT
# END OPENVPN RULES

Make sure to restart ufw after this.

Then in my home server, the OpenVPN client can be configured to run as a service:

# this is /etc/systemd/system/vpnclient.service
[Unit]
Description=Setup an openvpn tunnel to kite server
After=network.target

[Service]
ExecStart=/usr/sbin/openvpn --config /etc/openvpn/client1.conf
RestartSec=5
Restart=always

[Install]
WantedBy=multi-user.target

To enable it and start immediately:

sudo systemctl daemon-reload
sudo systemctl enable vpnclient
sudo systemctl start vpnclient

Also I need my home server to have a fixed IP for its tun0 network interface, so the nginx server can proxy traffic to this IP reliably. I followed this guide, except it suggested to do client-config-dir on both server and client sides but I only did on the server side and it worked for me:

# this is /etc/openvpn/server.conf
# uncomment the following line
client-config-dir ccd

# this is /etc/openvpn/ccd/client1
ifconfig-push 10.8.0.51 255.255.255.255

After this the OpenVPN server on the VM needs to be restarted:

sudo systemctl restart [email protected]

Reload the nginx server and it should be working. I tested it with curl -H "Host: raynix.info" 35.197.x.x and the request hit my home server.

🙂