My Last Dota 2 Game

I started playing Dota 2 in 2014 because:

  • It was a fun game
  • Can run natively on Linux

I picked it up again when the lock down started in 2020, and played quite a lot in the following months. Until I thought I have had enough Dota, because:

  • The game was getting more complicated and punished casual players who don’t pay a lot of attention to each update
  • which made the game like chess but some rules change from time to time
  • Dota+ is a paid subscription which can help casual players a bit but
  • I became depending on the subscription
  • Also it’s harder to have fun without a fixed team. Maybe too many teenager players?

Anyways to my surprise after 400+ days without a single Dota 2 game, I actually felt better. Good bye Dota 2.

🙂

Deploy the Loki Stack in a Kubernetes Cluster with ArgoCD

Loki and Promtail from Grafana Labs are new kids in the observability community. Are they good enough to replace Elasticsearch and Logstash? I would like to see.

Here’s a sample ArgoCD Application to deploy Loki, Promtail, Prometheus and Grafana all from 1 Helm chart: grafana/loki-stack. Some settings of my installations are:

  • loki, grafana and prometheus are deployed separately in their own namespaces
  • loki, grafana and prometheus use existing PVCs. This is more portable if I want to opt out this Helm chart later and keep the data
  • do not use configMap reloader because everything is sync’ed by ArgoCD
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: loki-stack-charts
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: loki
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: loki-stack
    repoURL: https://grafana.github.io/helm-charts
    targetRevision: 2.5.0
    helm:
      values: |
        loki:
          enabled: true
          persistence:
            type: pvc
            enabled: true
            existingClaim: loki
          securityContext:
            runAsGroup: 10001
            runAsUser: 10001

        promtail:
          enabled: true

        fluent-bit:
          enabled: false

        grafana:
          enabled: true
          namespaceOverride: grafana
          sidecar:
            datasources:
              enabled: false
          image:
            tag: 8.2.2
          persistence:
            type: pvc
            enabled: true
            existingClaim: grafana
          securityContext:
            runAsUser: 472
            runAsGroup: 472
          initChownData:
            enabled: false

        prometheus:
          enabled: true
          forceNamespace: prometheus
          configmapReload:
            prometheus:
              enabled: false
            alertmanager:
              enabled: false
          server:
            persistentVolume:
              enabled: true
              existingClaim: prometheus-server
              accessModes:
                - ReadWriteMany
            resources:
              limits:
                cpu: 800m
                memory: 1Gi
              requests:
                cpu: 500m
                memory: 512Mi
          alertmanager:
            persistentVolume:
              enabled: true
              existingClaim: prometheus-alertmanager
              accessModes:
                - ReadWriteMany
            resources:
              limits:
                cpu: 100m
                memory: 100Mi
              requests:
                cpu: 50m
                memory: 50Mi
          nodeExporter:
            enabled: true
            tolerations:
              - key: node-role.kubernetes.io/master
                effect: NoSchedule
            resources:
              limits:
                cpu: 200m
                memory: 50Mi
              requests:
                cpu: 100m
                memory: 30Mi

        filebeat:
          enabled: false

        logstash:
          enabled: false

For namespace, persistentVolume and Istio resources needed for loki-stack, please see these files.

🙂

Fixed CoreDNS High CPU Issue in a Kubernetes Cluster

The master node got hammered really hard after 8PM
Handy Slack notification

There was a Grafana alert saying that CPU usage was quite high on the master node of my garage Kubernetes cluster. I was watching a movie so I didn’t jump into this right away 🙂 I had a look at the master node today and this is how I fixed this issue.

With the good old SSH, I saw the CoreDNS process running on the master node chewing off big chunk of CPU time. From the CoreDNS pod’s logs( klogs is a handy bash function to tail a pod’s logs ):

klogs
.:53
[INFO] plugin/reload: Running configuration MD5 = 33fe3fc0f84bf45fc2e8e8e9701ce653
CoreDNS-1.8.0
linux/arm64, go1.15.3, 054c9ae
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp 192.168.1.254:53: connect: connection refused
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp 192.168.1.254:53: connect: connection refused
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp 192.168.1.254:53: connect: connection refused
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp 192.168.1.254:53: connect: connection refused

From the error message an Istio plugin called zipkin was querying CoreDNS but CoreDNS couldn’t reach its upstream(192.168.1.254). I tested DNS query in the node with

dig raynix.info @192.168.1.254
...
;; Query time: 27 msec
;; SERVER: 192.168.1.254#53(192.168.1.254)
;; WHEN: Thu Oct 28 14:45:15 AEDT 2021
;; MSG SIZE  rcvd: 72	

It works in a single attempt but perhaps failed because there were too many requests at the same time. 192.168.1.254 is just a consumer grade router, it’s fair that it could not handle a lot of requests. To test this assumption, I’ll need to update CoreDNS’ configuration.

From this official document all I need to do is to change the forward directive in the configuration. I changed the default upstream to Google DNS:

# kubectl edit cm coredns -n kube-system
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
          ttl 30
        }
        prometheus :9153
        # this was forward . /etc/resolv.conf
        forward . 8.8.8.8 8.8.4.4
        cache 60
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system

The running CoreDNS pods won’t take this change automatically, I gave them a quick restart:

kubectl rollout restart deploy coredns -n kube-system

Boom! Problem fixed. Also the calico-kube-controllers is no longer restarting, as you can see, if CPU is flat out, some pods can even fail its health check and as a result making CPU even more busy.

🙂

An Canary Upgrade of Istio 1.9 to 1.11

Prerequisites: full Admin access to a Kubernetes cluster, which has an older version of Istio installed.

A while ago I decided to try Istio in my garage Kubernetes lab, and replaced ingress-nginx with istio-ingressgateway. At the time being I installed Istio 1.9.4, the latest release is already 1.11.4. To avoid being left in the deprecated zone I planned to upgrade my Istio installation.

I chose the canary upgrade path because of

  • The in-place upgrade can be disruptive while canary upgrade keeps both versions running during the upgrade
  • The in-place upgrade doesn’t encourage skipping a version, on the other hand the canary upgrade can be done from 1.9 to 1.11

The first step is to update my istioctl command to the latest version. Then following the official Istio upgrade document I did a pre-flight check with:

istioctl version
client version: 1.11.4
control plane version: 1.9.4
data plane version: 1.9.4 (27 proxies)

istioctl x precheck
✔ No issues found when checking the cluster. Istio is safe to install or upgrade!
  To get started, check out https://istio.io/latest/docs/setup/getting-started/

The next step is to install the latest version with a revision. The revision name can be anything as its only purpose is to distinguish from the existing version. In my case I just used canary but it could be 1-11-4 which is more meaningful.

istioctl install --set revision=canary

Now check if the new control plane is running. The istiod-canary is obviously the one I need to pay attention to.

# below is just a copy from the official doc as I didn't save my command output
kubectl get pods -n istio-system -l app=istiod
NAME                                    READY   STATUS    RESTARTS   AGE
istiod-786779888b-p9s5n                 1/1     Running   0          114m
istiod-canary-6956db645c-vwhsk          1/1     Running   0          1m

It can be done namespace by namespace to test the new Istio data plane. Since I have ArgoCD in my cluster, I can simply change a namespace’s label in gitops and let ArgoCD apply the changes automatically. But to get the pods injected with new Istio sidecars I still need to restart the deployment with:

kubectl rollout restart deployment httpbin -n httpbin

Then I used the following command to check the result:

istioctl proxy-status | grep httpbin
httpbin-6d779b74f7-74pfs.httpbin                         SYNCED     SYNCED     SYNCED     SYNCED     istiod-canary-945cbbf49-mnwtn     1.11.4

After I updated all namespaces and restarted all deployments in those namespaces, I nervously uninstalled the old Istio version. The old version was installed with default settings and its revision is default too.

istioctl x uninstall --revision default
  Removed HorizontalPodAutoscaler:istio-system:istiod.
  Removed PodDisruptionBudget:istio-system:istiod.
  Removed Deployment:istio-system:istiod.
  Removed Service:istio-system:istiod.
  Removed ConfigMap:istio-system:istio.
  Removed ConfigMap:istio-system:istio-sidecar-injector.
  Removed Pod:istio-system:istiod-7d5fdcc6c-jqrf9.
  Removed EnvoyFilter:istio-system:metadata-exchange-1.8.
  Removed EnvoyFilter:istio-system:metadata-exchange-1.9.
  Removed EnvoyFilter:istio-system:stats-filter-1.8.
  Removed EnvoyFilter:istio-system:stats-filter-1.9.
  Removed EnvoyFilter:istio-system:tcp-metadata-exchange-1.8.
  Removed EnvoyFilter:istio-system:tcp-metadata-exchange-1.9.
  Removed EnvoyFilter:istio-system:tcp-stats-filter-1.8.
  Removed EnvoyFilter:istio-system:tcp-stats-filter-1.9.
  Removed MutatingWebhookConfiguration::istio-sidecar-injector.
✔ Uninstall complete   

And my blog is still online, so, success!

🙂