TL;DR This is a quick guide to set up Fluentd + Elasticsearch integration to analyse Squid Proxy traffic. In the example below Fluentd td-agent is installed in the same host as Squid Proxy and Elasticsearch is installed in the other host. The OS is Ubuntu 20.04.
Useful links:
– Fluentd installation: https://docs.fluentd.org/installation/install-by-deb
– Elasticsearch installation: https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html
The logs of Squid need to be accessible by td-agent, it can be done by adding td-agent user to the proxy group:
$ sudo usermod --groups proxy -a td-agent
The configuration for td-agent looks like
<source> @type tail @id squid_tail <parse> @type regexp expression /^(?<timestamp>[0-9]+)[\.0-9]* +(?<elapsed>[0-9]+) (?<userIP>[0-9\.]+) (?<action>[A-Z_]+)\/(?<statusCode>[0-9]+) (?<size>[0-9]+) (?<method>[A-Z]+) (?<URL>[^ ]+) (?<rfc931>[^ ]+) (?<peerStatus>[^ ]+)/(?<peerIP>[^ ]+) (?<mime>[^ ]+)/ time_key timestamp time_format %s </parse> path /var/log/squid/access.log tag squid.access </source>
The key is to get the regex expression to fit the Squid access log, which looks like
1598101487.920 240256 192.168.10.111 TCP_TUNNEL/200 1562 CONNECT www.google.com.au:443 - HIER_DIRECT/142.250.66.163 -
Then I can use the fields defined in the regex, such as userIP or URL in Elasticsearch for queries.
🙂