Manage AWS EBS Snapshot Life Cycle with Lambda

The timing is not so great. The AWS Data Lifecycle Manager has been announced but I can’t wait for its release. So I decided to use AWS Lambda to do some snapshot lifecycle management.

First a role for Lambda having full access to snapshots can be created via the console.

To create snapshot with Python 3.6 Lambda in AWS:

from datetime import datetime, timedelta

import boto3

def get_tag(tags, tag_name):
    for t in tags:
        if t['Key'] == tag_name:
            return t['Value']
    return 'None'
    
def get_delete_date():
    today = datetime.today()
    if today.weekday() == 0: 
        #Monday
        retention = 28
    else:
        retention = 7
    return (today + timedelta(days=retention)).strftime('%Y-%m-%d')
    
def snapshot_tags(instance, volume):
    tags = [{'Key': k, 'Value': str(v)} for k,v in volume.attachments[0].items()]
    tags.append({'Key': 'InstanceName', 'Value': get_tag(instance.tags, 'Name')})
    tags.append({'Key': 'DeleteOn', 'Value': get_delete_date()})
    return tags

def lambda_handler(event, context):
    ec2 = boto3.resource('ec2')
    for instance in ec2.instances.filter(Filters=[{'Name': "tag:Name", 'Values': [ 'AFLCDWH*' ] }]):
        for volume in instance.volumes.all():
            snapshot = ec2.create_snapshot(VolumeId=volume.id, Description="Snapshot for volume {0} on instance {1}".format(volume.id, get_tag(instance.tags, 'Name')))
            snapshot.create_tags(Resources=[snapshot.id], Tags=snapshot_tags(instance, volume))
    return 'done'

To recycle snapshots meant to be deleted today:

from datetime import datetime

import boto3

def lambda_handler(event, context):
    today = datetime.today().strftime('%Y-%m-%d')
    ec2 = boto3.resource('ec2')
    for snapshot in ec2.snapshots.filter(Filters=[{'Name': "tag:DeleteOn", 'Values': [ today ] }]):
        print(snapshot.id)
        snapshot.delete()
    return 'done'

At last, these functions can’t finish in 3 seconds, so the default 3 seconds time-out will kill them. I lifted the time-out to 1 minute.

Playing with Kubernetes Ingress Controller

It’s very very easy to use Kubernetes(K8s) to provision an external service with AWS ELB, there’s one catch though(at least for now in 2018).

AWS ELB is usually used with an auto scaling group and a launch configuration. However with K8s, EC2 instances won’t get spun directly, only pods will, which is call Horizontal Scaling. K8s will issue AWS API calls to update the ELBs so there’s no need for auto scaling groups or launch configurations.

This worked like a charm until when things got busy. There was a brief down time on one of the ELBs managed by K8s, because all instances at the back of the ELB were marked as unhealthy! Of course they were healthy at that moment. With help from AWS Support team, the culprit seems to be similar to this case: https://github.com/kubernetes/kubernetes/issues/47067.

Luckily for me, I had a gut feel that the simple ELB implementation isn’t the best practice and started to adopt the K8s Ingress Controller. And in this case I believe ingress can avoid the down time because the routing is done internally in K8s cluster which doesn’t involving AWS API calls. Nonetheless ingress can use 1 ELB for many apps and that’s good because ELBs are expensive.

Here are steps to deploy an nginx ingress controller as an http(L7) load balancer:

Deploy the mandatory schema, the default replica number for the controller is 2, I changed it to 3 to have 1 in each availability zone:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml

Some customisation for L7 load balancer on AWS, remember to use your SSL cert if you need https termination:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/service-l7.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/aws/patch-configmap-l7.yaml

Then an ingress for an app can be deployed:

$ cat .k8s/prod/ingress.yaml 
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: my-ingress
  namespace: my-prod
  annotations:
    kubernetes.io/ingress.class: prod
spec:
  rules:
    - host: my.domain.elb
      http:
        paths:
          - path: /
            backend:
              serviceName: my-service
              servicePort: 80
    - host: my.domain.cdn
      http:
        paths:
          - path: /
            backend:
              serviceName: my-service
              servicePort: 80

Notes:

  • my-service is a common NodePort service and has port 80 exposed
  • io/ingress.class is for multiple ingress controllers in same k8s cluster, eg. 1 for dev and the other for prod
  • for now I have to duplicate the host block for each domain, because wildcard or regex are not supported by k8s ingress specification
  • at last, find the ELB this ingress controller created, then point my.domain.elb to it, then the CDN domain can use my.domain.elb as origin.

🙂

Kubernetes External Service with HTTPS

This is a quick example to assign an SSL certificate to a Kubernetes external service(which is an ELB in AWS). Tested with kops 1.8 and kubernetes 1.8.

---
apiVersion: v1
kind: Service
metadata:
 name: my-https-service
 namespace: my-project
 labels:
   app: my-website-ssl
 annotations:
   service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:ap-southeast-2:xxx:certificate/xxx..."
   service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
   service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
   service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '3600'
spec:
 type: LoadBalancer
 selector:
   app: my-website
 ports:
   - name: http
     port: 80
     targetPort: 80
   - name: https
     port: 443
     targetPort: 80

🙂

Kops: Add Policies for Migrated Apps

When migrating some old applications to a Kubernetes(k8s) cluster provisioned by kops, a lot of things might break and one of them is the missing policy for the node.

By default, nodes of a k8s cluster have the following permissions:

ec2:Describe*
 ecr:GetAuthorizationToken
 ecr:BatchCheckLayerAvailability
 ecr:GetDownloadUrlForLayer
 ecr:GetRepositoryPolicy
 ecr:DescribeRepositories
 ecr:ListImages
 ecr:BatchGetImage
 route53:ListHostedZones
 route53:GetChange
 // The following permissions are scoped to AWS Route53 HostedZone used to bootstrap the cluster
 // arn:aws:route53:::hostedzone/$hosted_zone_id
 route53:ChangeResourceRecordSets, ListResourceRecordSets, GetHostedZone

Additional policies can be added to the nodes’ role by

kops edit cluster ${CLUSTER_NAME}

Then adding something like:

spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": ["dynamodb:*"],
          "Resource": ["*"]
        },
        {
          "Effect": "Allow",
          "Action": ["es:*"],
          "Resource": ["*"]
        }
      ]

Then it will be effective after:

kops update cluster ${CLUSTER_NAME} --yes

The new policy can be reviewed in AWS IAM console.

Most lines were copied from here: https://github.com/kubernetes/kops/blob/master/docs/iam_roles.md

🙂