How to Calculate Proportions Using 2 Metrics in Prometheus

Recently I needed to improve an alert defined using Prometheus Alert Manager to trigger when less than half of the minimum replicas are up, eg.

number_of_available_replicas / number_of_minimum_replicas < 0.5 

I had a look at existing metrics collected by kube-state-metrics in my Kubernetes cluster, the number of replicas running well could be queried with the kube_pod_status_ready metric and the number of minimum replicas with kube_horizontalpodautoscaler_spec_min_replicas metric.

I couldn’t simply use

sum by(some_label) (kube_pod_status_ready{namespace=~"some-pattern"}) / kube_horizontalpodautoscaler_spec_min_replicas{namespace=~"some-pattern"}

because the 2 metrics don’t share a same set of labels, Prometheus can’t do proportions. After some RtFM, here’s the expression that finally works using on syntax:

sum by(horizontalpodautoscaler) (label_replace(kube_pod_status_ready{namespace=~"some-pattern"}, "horizontalpodautoscaler", "$1", "pod", "(.*)(-[^-]+){2}")) / on(horizontalpodautoscaler) kube_horizontalpodautoscaler_spec_min_replicas{namespace=~"some-pattern"}  < %s

Some explanation:

  • on(label) let Prometheus to match the 2 metrics with specific labels, very useful in this scenario
  • label_replace(instant_vector, label_to_create, label_value, from_label, from_regex) adds(not quite sure why it has replace in its name) a new label called label_to_create with value label_value, from existing label from_label and match and capture the existing label
  • sum by(label) will a sum grouping by the provided label. The time series in metric kube_pod_status_ready has 1 when the pod is actually running, so a sum does the trick very well to count all running pods of a replica set managed by a Horizontal Pod Autoscaler(HPA)