Kubernetes Log Aggregation with Filebeat and Logstash

Following last blog, Filebeat is very easy to setup however it doesn’t do log pattern matching, guess I’ll need Logstash after all.

First is to install Logstash of course. To tell Filebeat to feed to Logstash instead of Elasticsearch is straightforward, here’s some configuration snippets:

Filebeat K8s configMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
    kubernetes.io/cluster-service: "true"
data:
  filebeat.yml: |-
  filebeat.config:
 
  ...
  # replace output.elasticsearch with this
  output.logstash:
    hosts: ['${LOGSTASH_HOST:logstash}:${LOGSTASH_PORT:5044}']

Sample Logstash configuration:

input {
  beats {
    port => "5044"
  }
}
filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}"}
  }
}
output {
  elasticsearch {
    hosts => [ "localhost:9200" ]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

COMBINEDAPACHELOG is the standard apache log format(as well as nginx’s). By using this predefined log format, values like request URI or referrer URL will be available as fields in Elastisearch.

🙂

Kubernetes Cluster Log Aggregation with Filebeat

Finally the Kubernetes cluster I was working on went live, and I didn’t provide a log aggregation solution yet. I had a look at dynaTrace, which is a paid SaaS. However it requires to install some agent in every container. It’s fun when there’s only several to play with but I wouldn’t rebuild dozens of docker containers just to get logs out.

Luckily enough I found Filebeat from Elastic which can be installed as a DaemonSet in a Kubernetes cluster and then pipe all logs to Elasticsearch and I already have an Elasticsearch cluster running so why not. The installation is quite easy following this guide:

1, Download the manifest

2, The only configuration needs to be changed are:

 env:
   - name: ELASTICSEARCH_HOST
     value: 10.1.1.10
   - name: ELASTICSEARCH_PORT
     value: "9200"
   - name: ELASTICSEARCH_USERNAME
     value: elastic
   - name: ELASTICSEARCH_PASSWORD
     value: changeme

Then load it to the kubernetes cluster:

kubectl apply -f filebeat.yaml

3, If the docker containers running in the cluster already logging to stdout/stderr, you should see logs flowing into Elasticsearch, otherwise check Filebeat logs in Kubernetes dashboard(it’s in kube-system name space).

4, Make sure to create an index for filebeat in Kibana, usually filebeat-*

That’s about it 🙂

Time Machine for Arch Linux

I’ve been using Arch Linux for some years, and it’s still my favorite Linux distribution. The feature that distinguished Arch from others is its rolling release which means there’s no such a thing called version in Arch. Using latest packages in Arch is the norm.

However living on the edge means it’s not quite safe. After I installed a bunch of updates including Gnome Shell 3.28, my XPS 15 laptop had trouble to bring up external monitor. It even froze when I plug the HDMI in hot.

I tried to revert some packages like

sudo pacman -U /var/cache/packman/pkg/some-package-1.0.xx.pkg.tar.xz

But it didn’t solve the problem because there were hundreds of packages in last update.

Almost going to panic, I found this instruction to revert all packages to a snapshot in time. And it actually worked wonders for me.

Only surprise is when downgrading packages, I saw errors like

 package-name: /path/to/package-file exists in filesystem

Guess it’s a safe guarding mechanism of pacman but since I know what I was doing so I simply deleted those files. The final command is

sudo pacman -Syyuu

which will bring Arch Linux back to a point of time and the issue has been fixed 🙂

Get access to a container in Kubernetes cluster

With Kubernetes(K8s), there’s no need to do ssh [email protected] anymore since everything is running as containers. There are still occasions when I need shell access to a container to do some troubleshooting.

With Docker I can do

docker exec -ti <container_id> /bin/bash

It’s quite similar in K8s

kubectl exec -ti <container_id> -- /bin/bash

However in K8s containers have random IDs so I need to know the container ID first

kubectl get pods

Then I can grab the container ID and do the `kubectl exec` command. This is hard to automate because picking up the expected container ID using `grep` and `awk` commands can fail if the matching condition is too strict.

Given a deployment like this

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: my-app-deploy
spec:
 replicas: 2
 template:
   metadata:
     labels:
       app: my-app
...

the K8s way to query the container will be

kubectl get pods --selector=app=my-app -o jsonpath='{.items[0].metadata.name}'

A chained one-liner could be

kubectl exec -ti $(kubectl get pods --selector=app=my-app -o jsonpath='{.items[0].metadata.name}') -- /bin/bash

This doesn’t check for errors, eg. if no container matching `app=my-app` was found, but a better script can be easily crafted from here.

🙂