Upload Limit in Kubernetes Nginx Ingress Controller

According to https://github.com/nginxinc/kubernetes-ingress/issues/21#issuecomment-408618569, this is how to lift the upload limit in Nginx Ingress Controller for Kubernetes after recent update to the project:

apiVersion: extensions/v1beta1
kind: Ingress
  name: test-project-ingress
  namespace: test-project-dev
    kubernetes.io/ingress.class: dev
    nginx.ingress.kubernetes.io/proxy-body-size: 200m
    - host: test-project.dev.com
          - path: /
              serviceName: test-project
              servicePort: 80

And for now the nginx pods have to be restarted before this can take effect. Hope this won’t be necessary in future


Auto Scaling in Kubernetes 1.9

I updated my Kubernetes cluster from 1.8 to 1.9 recently, the upgrade process is very smooth, however the auto-scaling part seemed to be failing. Below are some notes on how I troubleshoot this issue.

First to ensure I have both kops and kubectl upgraded to 1.9 on my laptop:

Install kubectl 1.9:

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.9.10/bin/linux/amd64/kubect

Install kops 1.9: https://github.com/kubernetes/kops/releases/tag/1.9.2

I was doing some load testing and I discovered that no matter how busy the pods were, they weren’t scaled out. To see what’s happening with the horizontal pod autoscaler(HPA), I use the following command:

kubectl describe hpa
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)

After some googling around, it turns out that Kubernetes 1.9 uses new metrics server for HPA, and my cluster didn’t have it. Here’s how to install metrics server for kubernetes cluster: https://github.com/kubernetes-incubator/metrics-server

To make this troubleshooting more interesting, the metrics server encountered error too! Looks like:

Failed to get kubernetes address: No kubernetes source found.

Bug fix for the metrics server:  https://github.com/kubernetes-incubator/metrics-server/issues/105#issuecomment-412818944

In short, adding a overriding command in `deploy/1.8+/metrics-server-deployment.yaml` got it working:

        - /metrics-server
        - --source=kubernetes.summary_api:''

Install cluster autoscaler for kubernetes cluster: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-one-asg.yaml I used `- image: k8s.gcr.io/cluster-autoscaler:v1.1.3` for Kubernetes 1.9. This part was without any surprises and worked as expected.

Sample HPA schema:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: test-hpa
  namespace: test
    apiVersion: apps/v1beta1
    kind: Deployment
    name: test-deploy
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50


Manage AWS EBS Snapshot Life Cycle with Lambda

The timing is not so great. The AWS Data Lifecycle Manager has been announced but I can’t wait for its release. So I decided to use AWS Lambda to do some snapshot lifecycle management.

First a role for Lambda having full access to snapshots can be created via the console.

To create snapshot with Python 3.6 Lambda in AWS:

from datetime import datetime, timedelta

import boto3

def get_tag(tags, tag_name):
    for t in tags:
        if t['Key'] == tag_name:
            return t['Value']
    return 'None'
def get_delete_date():
    today = datetime.today()
    if today.weekday() == 0: 
        retention = 28
        retention = 7
    return (today + timedelta(days=retention)).strftime('%Y-%m-%d')
def snapshot_tags(instance, volume):
    tags = [{'Key': k, 'Value': str(v)} for k,v in volume.attachments[0].items()]
    tags.append({'Key': 'InstanceName', 'Value': get_tag(instance.tags, 'Name')})
    tags.append({'Key': 'DeleteOn', 'Value': get_delete_date()})
    return tags

def lambda_handler(event, context):
    ec2 = boto3.resource('ec2')
    for instance in ec2.instances.filter(Filters=[{'Name': "tag:Name", 'Values': [ 'AFLCDWH*' ] }]):
        for volume in instance.volumes.all():
            snapshot = ec2.create_snapshot(VolumeId=volume.id, Description="Snapshot for volume {0} on instance {1}".format(volume.id, get_tag(instance.tags, 'Name')))
            snapshot.create_tags(Resources=[snapshot.id], Tags=snapshot_tags(instance, volume))
    return 'done'

To recycle snapshots meant to be deleted today:

from datetime import datetime

import boto3

def lambda_handler(event, context):
    today = datetime.today().strftime('%Y-%m-%d')
    ec2 = boto3.resource('ec2')
    for snapshot in ec2.snapshots.filter(Filters=[{'Name': "tag:DeleteOn", 'Values': [ today ] }]):
    return 'done'

At last, these functions can’t finish in 3 seconds, so the default 3 seconds time-out will kill them. I lifted the time-out to 1 minute.

Building Dynamic CI Pipeline with BuildKite

I was inspired by this BuildKite pipeline sample given by the support team:

# .buildkite/pipeline.yml
  - command: echo building a thing
  - block: Test the thing?
  - command: echo testing a thing
  - wait
  - command buildkite-agent pipeline upload .buildkite/pipeline.deploy.yml

# .buildkite/pipeline.deploy.yml
  - block: Deploy the thing?
  - command: echo deploy the thing

So in the above case, if the first 2 commands succeed, pipeline.deploy.yml will be loaded into the main CI pipeline. This implementation is just brilliant. I’m not sure if jenkinsfile can do dynamic pipeline like this, but at least jenkinsfile won’t look as elegant as yaml.

Since buildkite-agent pipeline upload .buildkite/pipeline.deploy.yml is just another bash command, I can even use it in a script to put more logic in it, such as git flow implementation like:

export CHOICE=$(buildkite-agent meta-data get "next-section")

case $CHOICE in
  buildkite-agent pipeline upload .buildkite/pipeline.qa.yml
  # feature finish
  if [[ $BUILDKITE_BRANCH == feature* ]]; then
    python .buildkite/scripts/github_ci.py \
      --action pr \
      --repo flow-work \
      --head $BUILDKITE_BRANCH \
      --base develop

  # release start
  elif [[ $BUILDKITE_BRANCH == develop ]]; then
    git checkout -b release/$FULL_VERSION
    git push --set-upstream origin release/$BUILDKITE_BUILD_NUMBER

  # release finish
  elif [[ $BUILDKITE_BRANCH == release* ]]; then
    buildkite-agent pipeline upload .buildkite/pipeline.pass.yml
  #mark build as failure
  exit -1

FYI. example tested with BuildKite agent version 3.2.0.