Use Fluentd and Elasticsearch to Analyse Squid Proxy Traffic

TL;DR This is a quick guide to set up Fluentd + Elasticsearch integration to analyse Squid Proxy traffic. In the example below Fluentd td-agent is installed in the same host as Squid Proxy and Elasticsearch is installed in the other host. The OS is Ubuntu 20.04.

Useful links:
– Fluentd installation:
– Elasticsearch installation:

The logs of Squid need to be accessible by td-agent, it can be done by adding td-agent user to the proxy group:

$ sudo usermod --groups proxy -a td-agent

The configuration for td-agent looks like

  @type tail
  @id squid_tail
    @type regexp
    expression /^(?<timestamp>[0-9]+)[\.0-9]* +(?<elapsed>[0-9]+) (?<userIP>[0-9\.]+) (?<action>[A-Z_]+)\/(?<statusCode>[0-9]+) (?<size>[0-9]+) (?<method>[A-Z]+) (?<URL>[^ ]+) (?<rfc931>[^ ]+) (?<peerStatus>[^ ]+)/(?<peerIP>[^ ]+) (?<mime>[^ ]+)/
    time_key timestamp
    time_format %s
  path /var/log/squid/access.log
  tag squid.access

<match squid.access>
  @type elasticsearch
  host <elasticsearch server IP>
  port 9200
  logstash_format true
  flush_interval 10s
  index_name fluentd
  type_name fluentd
  include_tag_key true
  user elastic
  password <elsticsearch password>

The key is to get the regex expression to fit the Squid access log, which looks like

1598101487.920 240256 TCP_TUNNEL/200 1562 CONNECT - HIER_DIRECT/ -

Then I can use the fields defined in the regex, such as userIP or URL in Elasticsearch for queries.


Kubernetes Log Aggregation with Filebeat and Logstash

Following last blog, Filebeat is very easy to setup however it doesn’t do log pattern matching, guess I’ll need Logstash after all.

First is to install Logstash of course. To tell Filebeat to feed to Logstash instead of Elasticsearch is straightforward, here’s some configuration snippets:

Filebeat K8s configMap:

apiVersion: v1
kind: ConfigMap
  name: filebeat-config
  namespace: kube-system
    k8s-app: filebeat "true"
  filebeat.yml: |-
  # replace output.elasticsearch with this
    hosts: ['${LOGSTASH_HOST:logstash}:${LOGSTASH_PORT:5044}']

Sample Logstash configuration:

input {
  beats {
    port => "5044"
filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}"}
output {
  elasticsearch {
    hosts => [ "localhost:9200" ]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"

COMBINEDAPACHELOG is the standard apache log format(as well as nginx’s). By using this predefined log format, values like request URI or referrer URL will be available as fields in Elastisearch.


Kubernetes Cluster Log Aggregation with Filebeat

Finally the Kubernetes cluster I was working on went live, and I didn’t provide a log aggregation solution yet. I had a look at dynaTrace, which is a paid SaaS. However it requires to install some agent in every container. It’s fun when there’s only several to play with but I wouldn’t rebuild dozens of docker containers just to get logs out.

Luckily enough I found Filebeat from Elastic which can be installed as a DaemonSet in a Kubernetes cluster and then pipe all logs to Elasticsearch and I already have an Elasticsearch cluster running so why not. The installation is quite easy following this guide:

1, Download the manifest

2, The only configuration needs to be changed are:

     value: "9200"
     value: elastic
     value: changeme

Then load it to the kubernetes cluster:

kubectl apply -f filebeat.yaml

3, If the docker containers running in the cluster already logging to stdout/stderr, you should see logs flowing into Elasticsearch, otherwise check Filebeat logs in Kubernetes dashboard(it’s in kube-system name space).

4, Make sure to create an index for filebeat in Kibana, usually filebeat-*

That’s about it 🙂