In a populous GKE cluster, I saw the memory utilisation getting very high. After some investigation, to my surprise, a great deal of memory was consumed by tiny Istio sidecars. And they are getting bloated round the clock.
$ k top pod <pod-name> --containers POD NAME CPU(cores) MEMORY(bytes) api-client-7b9889c7d8-6lrqk istio-proxy 6m 540Mi api-client-7b9889c7d8-6lrqk api-client 4m 185Mi
The Istio sidecar essentially is an envoy proxy configured by Istio controller – istiod. It’s usually light-weight, like 50MB of memory but how does this happen? After some research I googled this article which exactly answered my question. So in a nut shell there are probably too many sidecars in this cluster, and each of them was configured to cache service mesh entries for every other sidecar in the mesh.
To my curiosity, I counted all istio-proxy containers in the cluster like this:
$ k get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].name}{"\n"}{end}'|rg istio-proxy -c
1663So basically we’re paying for around 831GB of memory just because the sidecars got fat…
According to the Istio doc, there’s a way to let envoy only cache whitelisted hosts, eg.
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
name: default # this is the default for the namespace
namespace: this-namespace
spec:
egress:
- hosts:
- "app-namespace/service-name.app-namespace.svc.cluster.local"
- "istio-system/*" # this is for egress trafficIt will be a tedious job to whitelist all hosts for all sidecars without knowing how the mesh is configured. So here comes Kiali to the rescue. With Kiali it’s easy to visualise the mesh and know exactly which apps your app needs to access. For a more fine-grained configuration for each app, if there are multiple apps sharing a namespace, such as:
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
name: my-app
namespace: this-namespace
spec:
workloadSelector:
labels:
app: my-app
egress:
- hosts:
- "app-namespace/service-name.app-namespace.svc.cluster.local"
- "istio-system/*" # this is for egress trafficAfter these Sidecars are deployed, it is a huge relief to see 700+GB of memory were release 🙂
