TL;DR This is a quick guide to set up Fluentd + Elasticsearch integration to analyse Squid Proxy traffic. In the example below Fluentd td-agent is installed in the same host as Squid Proxy and Elasticsearch is installed in the other host. The OS is Ubuntu 20.04.
Useful links:
– Fluentd installation: https://docs.fluentd.org/installation/install-by-deb
– Elasticsearch installation: https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html
The logs of Squid need to be accessible by td-agent, it can be done by adding td-agent user to the proxy group:
$ sudo usermod --groups proxy -a td-agent
The configuration for td-agent looks like
<source>
@type tail
@id squid_tail
<parse>
@type regexp
expression /^(?<timestamp>[0-9]+)[\.0-9]* +(?<elapsed>[0-9]+) (?<userIP>[0-9\.]+) (?<action>[A-Z_]+)\/(?<statusCode>[0-9]+) (?<size>[0-9]+) (?<method>[A-Z]+) (?<URL>[^ ]+) (?<rfc931>[^ ]+) (?<peerStatus>[^ ]+)/(?<peerIP>[^ ]+) (?<mime>[^ ]+)/
time_key timestamp
time_format %s
</parse>
path /var/log/squid/access.log
tag squid.access
</source>
<match squid.access>
@type elasticsearch
host <elasticsearch server IP>
port 9200
logstash_format true
flush_interval 10s
index_name fluentd
type_name fluentd
include_tag_key true
user elastic
password <elsticsearch password>
</match>
The key is to get the regex expression to fit the Squid access log, which looks like
1598101487.920 240256 192.168.10.111 TCP_TUNNEL/200 1562 CONNECT www.google.com.au:443 - HIER_DIRECT/142.250.66.163 -
Then I can use the fields defined in the regex, such as userIP or URL in Elasticsearch for queries.
🙂