Nicer Deployment with Kubernetes

The default strategy to do rolling update in a Kubernetes deployment is to reduce the capacity of current replica set and then add the capacity to the new replica set. This probably means total processing power for the app could be hindered a bit during the deployment.

I’m a bit surprised to find that the default strategy works this way. But luckily it’s not hard to fine tune this. According to the doc here: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment only a few lines is needed to change the strategy:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-deploy
namespace: my-project
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
maxSurge: 40%
revisionHistoryLimit: 3

By maxUnavailable: 0 this means the total capacity of the deployment will not be reduced. and maxSurge: 40% means new replica can reach 40% of the capacity of the current replica set, before the current one become the old one and drained.

Not a big improvement but revisionHistoryLimit: 3 will only keep 3 replica sets for purpose to roll back a deployment. The default for this is unlimited, which is quite over provisioned, from my point of view.

🙂

Ansible, Packer and Configurations

I use Ansible as provisioner for Packer, to build AMIs to be used as a base image of our development environment. When Ansible is used by Packer, it’s not quite intuitive whether it’s using the same ansible.cfg when I run ansible-playbook command in a terminal.

Here’s how to make sure Ansible in Packer session will use the correct ansible.cfg file. 

First, an ENV is supplied in Packer’s template, because ENV precedes any other configuration that can be found:

    {
      "type": "ansible",
      "playbook_file": "../../ansible/apps.yml",
      "ansible_env_vars": ["ANSIBLE_CONFIG=/tmp/ansible.cfg"],
      "extra_arguments": [
        "--skip-tags=packer-skip",
        "-vvv"
      ]
    },

The line with “ANSIBLE_CONFIG=/tmp/ansible.cfg” will tell ansible to use /tmp/ansible.cfg.

With the ansible.cfg at /tmp, and the extra debug switch -vvv I can see in the output if the config file is picked up.


Don’t Panic When Kubernetes Master Failed

It was business as usual when I was upgrading our Kubernetes cluster from 1.9.8 to 1.9.10, until it isn’t.

$ kops rolling-update cluster --yes
...
node "ip-10-xx-xx-xx.ap-southeast-2.compute.internal" drained
...
I1024 08:52:50.388672   16009 instancegroups.go:188] Validating the cluster.
...
I1024 08:58:22.725713   16009 instancegroups.go:246] Cluster did not validate, will try again in "30s" until duration "5m0s" expires: error listing nodes: Get https://api.my.kops.domain/api/v1/nodes: dial tcp yy.yy.yy.yy:443: i/o timeout.
E1024 08:58:22.725749   16009 instancegroups.go:193] Cluster did not validate within 5m0s

error validating cluster after removing a node: cluster did not validate within a duation of "5m0s"

From AWS console I can see the new instance for the master is running and the old one has been terminated. There’s 1 catch though, the IP yy.yy.yy.yy is not the IP of the new master instance!

I manually updated the api and api.internal CNAMEs of the Kubernetes cluster in Route 53 and the issue went away quickly. I assume for some reason the DNS update for the new master has failed, but happy to see everything else worked as expected.

🙂

Run Google Lighthouse in Docker Container

Thanks to my Colleague Simon’s suggestion, I was introduced to Google Lighthouse, an opensource nodejs framework to use Google Chrome to audit a website’s performance.

I like Lighthouse because:

  • opensource
  • good portability
  • can run as CLI command or as a nodejs module

Here’s a sample Dockerfile to have a container ready to run Lighthouse with Google Chrome for Linux.

FROM debian:stretch

USER root
WORKDIR /root
ENV CHROME_VERSION="google-chrome-stable"

# system packages
RUN apt update -qqy && \
  apt install -qqy build-essential gnupg wget curl jq

# nodejs 10
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - && \
  apt install -qqy nodejs && \
  npm install -g lighthouse

# google-chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
  echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list && \
  apt update -qqy && \
  apt install -qqy ${CHROME_VERSION:-google-chrome-stable}

# python3 (optional for metric processing)
RUN apt install -qqy python3 python3-pip && \
  pip3 install influxdb

# lighthouse
RUN useradd -ms /bin/bash lighthouse
USER lighthouse
WORKDIR /home/lighthouse

Then lighthouse can be executed in the container to audit $url:

CHROME_PATH=$(which google-chrome) lighthouse $url --emulated-form-factor=none --output=json --chrome-flags="--headless --no-sandbox"

The result json will be sent to stdout, and it can be easily piped to other scripts for post processing, eg. parse json and extract metrics, etc…

🙂