In a populous GKE cluster, I saw the memory utilisation getting very high. After some investigation, to my surprise, a great deal of memory was consumed by tiny Istio sidecars. And they are getting bloated round the clock.
$ k top pod <pod-name> --containers POD NAME CPU(cores) MEMORY(bytes) api-client-7b9889c7d8-6lrqk istio-proxy 6m 540Mi api-client-7b9889c7d8-6lrqk api-client 4m 185Mi
The Istio sidecar essentially is an envoy proxy configured by Istio controller – istiod. It’s usually light-weight, like 50MB of memory but how does this happen? After some research I googled this article which exactly answered my question. So in a nut shell there are probably too many sidecars in this cluster, and each of them was configured to cache service mesh entries for every other sidecar in the mesh.
To my curiosity, I counted all istio-proxy
containers in the cluster like this:
$ k get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].name}{"\n"}{end}'|rg istio-proxy -c 1663
So basically we’re paying for around 831GB of memory just because the sidecars got fat…
According to the Istio doc, there’s a way to let envoy only cache whitelisted hosts, eg.
apiVersion: networking.istio.io/v1beta1 kind: Sidecar metadata: name: default # this is the default for the namespace namespace: this-namespace spec: egress: - hosts: - "app-namespace/service-name.app-namespace.svc.cluster.local" - "istio-system/*" # this is for egress traffic
It will be a tedious job to whitelist all hosts for all sidecars without knowing how the mesh is configured. So here comes Kiali to the rescue. With Kiali it’s easy to visualise the mesh and know exactly which apps your app needs to access. For a more fine-grained configuration for each app, if there are multiple apps sharing a namespace, such as:
apiVersion: networking.istio.io/v1beta1 kind: Sidecar metadata: name: my-app namespace: this-namespace spec: workloadSelector: labels: app: my-app egress: - hosts: - "app-namespace/service-name.app-namespace.svc.cluster.local" - "istio-system/*" # this is for egress traffic
After these Sidecars are deployed, it is a huge relief to see 700+GB of memory were release 🙂