Get access to a container in Kubernetes cluster

With Kubernetes(K8s), there’s no need to do ssh [email protected] anymore since everything is running as containers. There are still occasions when I need shell access to a container to do some troubleshooting.

With Docker I can do

docker exec -ti <container_id> /bin/bash

It’s quite similar in K8s

kubectl exec -ti <container_id> -- /bin/bash

However in K8s containers have random IDs so I need to know the container ID first

kubectl get pods

Then I can grab the container ID and do the `kubectl exec` command. This is hard to automate because picking up the expected container ID using `grep` and `awk` commands can fail if the matching condition is too strict.

Given a deployment like this

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: my-app-deploy
spec:
 replicas: 2
 template:
   metadata:
     labels:
       app: my-app
...

the K8s way to query the container will be

kubectl get pods --selector=app=my-app -o jsonpath='{.items[0].metadata.name}'

A chained one-liner could be

kubectl exec -ti $(kubectl get pods --selector=app=my-app -o jsonpath='{.items[0].metadata.name}') -- /bin/bash

This doesn’t check for errors, eg. if no container matching `app=my-app` was found, but a better script can be easily crafted from here.

🙂

Internal Service in Kubernetes Cluster

In Kubernetes(K8s) cluster, 1 or more containers form a pod and every container in the pod can access other container’s port just like apps in the same local host. For example:

- pod1
  - nginx1
  - gunicorn1, port:8000

- pod2
  - nginx2
  - gunicorn2, port:8000

So nginx1 can access gunicorn1’s port using localhost:8000 and nginx2 can access gunicorn2, etc… However nginx1 can’t see gunicorn2 using localhost.

When it comes to cache, like redis, I would like a shared redis container(or cluster later) so less memory is wasted and the cache doesn’t need to be warmed for each pod. The structure will look like:

- pod1
  - nginx1
  - gunicorn1
- pod2
  - nginx2
  - gunicorn2
- pod3
  - redis

To grant both gunicorns to access redis, redis needs to be a service:

---
apiVersion: v1
kind: Service
metadata:
  name: redis-svc
  labels:
    app: redis
    role: cache
spec:
  type: NodePort
  ports:
    - port: 6379
  selector:
    app: redis
    role: cache

And the deployment to support the service looks like:

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: redis-deploy
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: redis
        role: cache
    spec:
      containers:
        - name: redis
          image: redis:4
          resources:
            requests:
              memory: 200Mi
          ports:
            - containerPort: 6379

Then in the settings.py of the django app running in gunicorn container, redis can be accessed via ENVs setup by K8s:

CACHES = {
  "default": {
    "BACKEND": "django_redis.cache.RedisCache",
    "LOCATION": "redis://{0}:{1}/1".format(os.environ['REDIS_SVC_SERVICE_HOST'], os.environ['REDIS_SVC_SERVICE_PORT']),
    "OPTIONS": {
      "CLIENT_CLASS": "django_redis.client.DefaultClient"
    },
  }
}

🙂

Kops: Add Policies for Migrated Apps

When migrating some old applications to a Kubernetes(k8s) cluster provisioned by kops, a lot of things might break and one of them is the missing policy for the node.

By default, nodes of a k8s cluster have the following permissions:

ec2:Describe*
 ecr:GetAuthorizationToken
 ecr:BatchCheckLayerAvailability
 ecr:GetDownloadUrlForLayer
 ecr:GetRepositoryPolicy
 ecr:DescribeRepositories
 ecr:ListImages
 ecr:BatchGetImage
 route53:ListHostedZones
 route53:GetChange
 // The following permissions are scoped to AWS Route53 HostedZone used to bootstrap the cluster
 // arn:aws:route53:::hostedzone/$hosted_zone_id
 route53:ChangeResourceRecordSets, ListResourceRecordSets, GetHostedZone

Additional policies can be added to the nodes’ role by

kops edit cluster ${CLUSTER_NAME}

Then adding something like:

spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": ["dynamodb:*"],
          "Resource": ["*"]
        },
        {
          "Effect": "Allow",
          "Action": ["es:*"],
          "Resource": ["*"]
        }
      ]

Then it will be effective after:

kops update cluster ${CLUSTER_NAME} --yes

The new policy can be reviewed in AWS IAM console.

Most lines were copied from here: https://github.com/kubernetes/kops/blob/master/docs/iam_roles.md

🙂

Notes: BuildKite and Kubernetes Rolling Update

This is kind of a textbook case that container is much more efficient than VM. The CI pipeline in comparison uses AWS CloudFormation to build new VMs and drain old VMs to do a rolling update, which takes around 10 minutes for everything even if it’s just 1 line of code changed. I did a new pipeline with BuildKite and Kubernetes and a deploy is done within 2 minutes.

The key points to make the pipeline fast are:

  1. In the Dockerfile, the part that changes more frequently should be put at the bottom of the file, so that docker can maximise its build speed by using cached intermediate images.
  2. Reload Kubernetes config maps with this:
    kubectl create configmap nginx-config --from-file path/to/nginx.conf -o yaml --dry-run |kubectl replace -f -
  3. Reload containers with this(I use ECR):
    $BUILDKITE_BUILD_NUMBER obviously is the build number environment variable provided by BuildKite.

    kubectl set image deployment/my_deployment \
     nginx=my.ecr.amazonaws.com/nginx:$BUILDKITE_BUILD_NUMBER \
     php=my.ecr.amazonaws.com/php:$BUILDKITE_BUILD_NUMBER
  4. Finally watch rolling update progress with this command:
    kubectl rollout status deployment/my_deployment

🙂