Use Fluentd and Elasticsearch to Analyse Squid Proxy Traffic


TL;DR This is a quick guide to set up Fluentd + Elasticsearch integration to analyse Squid Proxy traffic. In the example below Fluentd td-agent is installed in the same host as Squid Proxy and Elasticsearch is installed in the other host. The OS is Ubuntu 20.04.

Useful links:
– Fluentd installation: https://docs.fluentd.org/installation/install-by-deb
– Elasticsearch installation: https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html

The logs of Squid need to be accessible by td-agent, it can be done by adding td-agent user to the proxy group:

$ sudo usermod --groups proxy -a td-agent

The configuration for td-agent looks like

<source>
  @type tail
  @id squid_tail
  <parse>
    @type regexp
    expression /^(?<timestamp>[0-9]+)[\.0-9]* +(?<elapsed>[0-9]+) (?<userIP>[0-9\.]+) (?<action>[A-Z_]+)\/(?<statusCode>[0-9]+) (?<size>[0-9]+) (?<method>[A-Z]+) (?<URL>[^ ]+) (?<rfc931>[^ ]+) (?<peerStatus>[^ ]+)/(?<peerIP>[^ ]+) (?<mime>[^ ]+)/
    time_key timestamp
    time_format %s
  </parse>
  path /var/log/squid/access.log
  tag squid.access
</source>

The key is to get the regex expression to fit the Squid access log, which looks like

1598101487.920 240256 192.168.10.111 TCP_TUNNEL/200 1562 CONNECT www.google.com.au:443 - HIER_DIRECT/142.250.66.163 -

Then I can use the fields defined in the regex, such as userIP or URL in Elasticsearch for queries.

🙂