I updated my Kubernetes cluster from 1.8 to 1.9 recently, the upgrade process is very smooth, however the auto-scaling part seemed to be failing. Below are some notes on how I troubleshoot this issue.
First to ensure I have both kops and kubectl upgraded to 1.9 on my laptop:
Install kubectl 1.9:
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.9.10/bin/linux/amd64/kubect
Install kops 1.9: https://github.com/kubernetes/kops/releases/tag/1.9.2
I was doing some load testing and I discovered that no matter how busy the pods were, they weren’t scaled out. To see what’s happening with the horizontal pod autoscaler(HPA), I use the following command:
kubectl describe hpa
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
After some googling around, it turns out that Kubernetes 1.9 uses new metrics server for HPA, and my cluster didn’t have it. Here’s how to install metrics server for kubernetes cluster: https://github.com/kubernetes-incubator/metrics-server
To make this troubleshooting more interesting, the metrics server encountered error too! Looks like:
Failed to get kubernetes address: No kubernetes source found.
Bug fix for the metrics server: https://github.com/kubernetes-incubator/metrics-server/issues/105#issuecomment-412818944
In short, adding a overriding command in `deploy/1.8+/metrics-server-deployment.yaml` got it working:
Install cluster autoscaler for kubernetes cluster: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-one-asg.yaml I used `- image: k8s.gcr.io/cluster-autoscaler:v1.1.3` for Kubernetes 1.9. This part was without any surprises and worked as expected.
Sample HPA schema: