My Last Dota 2 Game

I started playing Dota 2 in 2014 because:

  • It was a fun game
  • Can run natively on Linux

I picked it up again when the lock down started in 2020, and played quite a lot in the following months. Until I thought I have had enough Dota, because:

  • The game was getting more complicated and punished casual players who don’t pay a lot of attention to each update
  • which made the game like chess but some rules change from time to time
  • Dota+ is a paid subscription which can help casual players a bit but
  • I became depending on the subscription
  • Also it’s harder to have fun without a fixed team. Maybe too many teenager players?

Anyways to my surprise after 400+ days without a single Dota 2 game, I actually felt better. Good bye Dota 2.


Deploy the Loki Stack in a Kubernetes Cluster with ArgoCD

Loki and Promtail from Grafana Labs are new kids in the observability community. Are they good enough to replace Elasticsearch and Logstash? I would like to see.

Here’s a sample ArgoCD Application to deploy Loki, Promtail, Prometheus and Grafana all from 1 Helm chart: grafana/loki-stack. Some settings of my installations are:

  • loki, grafana and prometheus are deployed separately in their own namespaces
  • loki, grafana and prometheus use existing PVCs. This is more portable if I want to opt out this Helm chart later and keep the data
  • do not use configMap reloader because everything is sync’ed by ArgoCD
kind: Application
  name: loki-stack-charts
  namespace: argocd
    namespace: loki
    server: https://kubernetes.default.svc
  project: default
    chart: loki-stack
    targetRevision: 2.5.0
      values: |
          enabled: true
            type: pvc
            enabled: true
            existingClaim: loki
            runAsGroup: 10001
            runAsUser: 10001

          enabled: true

          enabled: false

          enabled: true
          namespaceOverride: grafana
              enabled: false
            tag: 8.2.2
            type: pvc
            enabled: true
            existingClaim: grafana
            runAsUser: 472
            runAsGroup: 472
            enabled: false

          enabled: true
          forceNamespace: prometheus
              enabled: false
              enabled: false
              enabled: true
              existingClaim: prometheus-server
                - ReadWriteMany
                cpu: 800m
                memory: 1Gi
                cpu: 500m
                memory: 512Mi
              enabled: true
              existingClaim: prometheus-alertmanager
                - ReadWriteMany
                cpu: 100m
                memory: 100Mi
                cpu: 50m
                memory: 50Mi
            enabled: true
              - key:
                effect: NoSchedule
                cpu: 200m
                memory: 50Mi
                cpu: 100m
                memory: 30Mi

          enabled: false

          enabled: false

For namespace, persistentVolume and Istio resources needed for loki-stack, please see these files.


Fixed CoreDNS High CPU Issue in a Kubernetes Cluster

The master node got hammered really hard after 8PM
Handy Slack notification

There was a Grafana alert saying that CPU usage was quite high on the master node of my garage Kubernetes cluster. I was watching a movie so I didn’t jump into this right away 🙂 I had a look at the master node today and this is how I fixed this issue.

With the good old SSH, I saw the CoreDNS process running on the master node chewing off big chunk of CPU time. From the CoreDNS pod’s logs( klogs is a handy bash function to tail a pod’s logs ):

[INFO] plugin/reload: Running configuration MD5 = 33fe3fc0f84bf45fc2e8e8e9701ce653
linux/arm64, go1.15.3, 054c9ae
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp connect: connection refused
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp connect: connection refused
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp connect: connection refused
[ERROR] plugin/errors: 2 zipkin.istio-system. A: dial tcp connect: connection refused

From the error message an Istio plugin called zipkin was querying CoreDNS but CoreDNS couldn’t reach its upstream( I tested DNS query in the node with

dig @
;; Query time: 27 msec
;; WHEN: Thu Oct 28 14:45:15 AEDT 2021
;; MSG SIZE  rcvd: 72	

It works in a single attempt but perhaps failed because there were too many requests at the same time. is just a consumer grade router, it’s fair that it could not handle a lot of requests. To test this assumption, I’ll need to update CoreDNS’ configuration.

From this official document all I need to do is to change the forward directive in the configuration. I changed the default upstream to Google DNS:

# kubectl edit cm coredns -n kube-system
apiVersion: v1
  Corefile: |
    .:53 {
        health {
          lameduck 5s
        kubernetes cluster.local {
          pods insecure
          ttl 30
        prometheus :9153
        # this was forward . /etc/resolv.conf
        forward .
        cache 60
kind: ConfigMap
  name: coredns
  namespace: kube-system

The running CoreDNS pods won’t take this change automatically, I gave them a quick restart:

kubectl rollout restart deploy coredns -n kube-system

Boom! Problem fixed. Also the calico-kube-controllers is no longer restarting, as you can see, if CPU is flat out, some pods can even fail its health check and as a result making CPU even more busy.


An Canary Upgrade of Istio 1.9 to 1.11

Prerequisites: full Admin access to a Kubernetes cluster, which has an older version of Istio installed.

A while ago I decided to try Istio in my garage Kubernetes lab, and replaced ingress-nginx with istio-ingressgateway. At the time being I installed Istio 1.9.4, the latest release is already 1.11.4. To avoid being left in the deprecated zone I planned to upgrade my Istio installation.

I chose the canary upgrade path because of

  • The in-place upgrade can be disruptive while canary upgrade keeps both versions running during the upgrade
  • The in-place upgrade doesn’t encourage skipping a version, on the other hand the canary upgrade can be done from 1.9 to 1.11

The first step is to update my istioctl command to the latest version. Then following the official Istio upgrade document I did a pre-flight check with:

istioctl version
client version: 1.11.4
control plane version: 1.9.4
data plane version: 1.9.4 (27 proxies)

istioctl x precheck
✔ No issues found when checking the cluster. Istio is safe to install or upgrade!
  To get started, check out

The next step is to install the latest version with a revision. The revision name can be anything as its only purpose is to distinguish from the existing version. In my case I just used canary but it could be 1-11-4 which is more meaningful.

istioctl install --set revision=canary

Now check if the new control plane is running. The istiod-canary is obviously the one I need to pay attention to.

# below is just a copy from the official doc as I didn't save my command output
kubectl get pods -n istio-system -l app=istiod
NAME                                    READY   STATUS    RESTARTS   AGE
istiod-786779888b-p9s5n                 1/1     Running   0          114m
istiod-canary-6956db645c-vwhsk          1/1     Running   0          1m

It can be done namespace by namespace to test the new Istio data plane. Since I have ArgoCD in my cluster, I can simply change a namespace’s label in gitops and let ArgoCD apply the changes automatically. But to get the pods injected with new Istio sidecars I still need to restart the deployment with:

kubectl rollout restart deployment httpbin -n httpbin

Then I used the following command to check the result:

istioctl proxy-status | grep httpbin
httpbin-6d779b74f7-74pfs.httpbin                         SYNCED     SYNCED     SYNCED     SYNCED     istiod-canary-945cbbf49-mnwtn     1.11.4

After I updated all namespaces and restarted all deployments in those namespaces, I nervously uninstalled the old Istio version. The old version was installed with default settings and its revision is default too.

istioctl x uninstall --revision default
  Removed HorizontalPodAutoscaler:istio-system:istiod.
  Removed PodDisruptionBudget:istio-system:istiod.
  Removed Deployment:istio-system:istiod.
  Removed Service:istio-system:istiod.
  Removed ConfigMap:istio-system:istio.
  Removed ConfigMap:istio-system:istio-sidecar-injector.
  Removed Pod:istio-system:istiod-7d5fdcc6c-jqrf9.
  Removed EnvoyFilter:istio-system:metadata-exchange-1.8.
  Removed EnvoyFilter:istio-system:metadata-exchange-1.9.
  Removed EnvoyFilter:istio-system:stats-filter-1.8.
  Removed EnvoyFilter:istio-system:stats-filter-1.9.
  Removed EnvoyFilter:istio-system:tcp-metadata-exchange-1.8.
  Removed EnvoyFilter:istio-system:tcp-metadata-exchange-1.9.
  Removed EnvoyFilter:istio-system:tcp-stats-filter-1.8.
  Removed EnvoyFilter:istio-system:tcp-stats-filter-1.9.
  Removed MutatingWebhookConfiguration::istio-sidecar-injector.
✔ Uninstall complete   

And my blog is still online, so, success!