Ansible, Packer and Configurations

I use Ansible as provisioner for Packer, to build AMIs to be used as a base image of our development environment. When Ansible is used by Packer, it’s not quite intuitive whether it’s using the same ansible.cfg when I run ansible-playbook command in a terminal.

Here’s how to make sure Ansible in Packer session will use the correct ansible.cfg file. 

First, an ENV is supplied in Packer’s template, because ENV precedes any other configuration that can be found:

    {
      "type": "ansible",
      "playbook_file": "../../ansible/apps.yml",
      "ansible_env_vars": ["ANSIBLE_CONFIG=/tmp/ansible.cfg"],
      "extra_arguments": [
        "--skip-tags=packer-skip",
        "-vvv"
      ]
    },

The line with “ANSIBLE_CONFIG=/tmp/ansible.cfg” will tell ansible to use /tmp/ansible.cfg.

With the ansible.cfg at /tmp, and the extra debug switch -vvv I can see in the output if the config file is picked up.


Install Fluentd with Ansible

Fluentd has become the popular open source log aggregration framework for a while. I’ll try to give it a spin with Ansible. There are quite some existing Ansible playbooks to install Fluentd out there, but I would like to do it from scratch just to understand how it works.

From the installation guide page, I can grab the script and dependencies and then translate them into Ansible tasks:

---
# roles/fluentd-collector/tasks/install-xenial.yml
- name: install os packages
  package:
    name: '{{ item }}'
    state: latest
  with_items:
    - libcurl4-gnutls-dev
    - build-essential

- name: insatll fluentd on debian/ubuntu
  raw: "curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent2.sh | sh"

Then it can be included by the main task:

# roles/fluentd-collector/tasks/main.yml
# (incomplete)
- include: install-debian.yml
  when: ansible_os_family == 'Debian'

In the log collecting end, I need to configure /etc/td-agent/td-agent.conf to let fluentd(the stable release is call td-agent) receive syslog, tail other logs and then forward the data to the central collector end. Here’s some sample configuration with jinja2 template place holders:

<match *.**>
  type forward
  phi_threshold 100
  hard_timeout 60s
  <server>
    name mycollector
    host {{ fluent_server_ip }}
    port {{ fluent_server_port }}
    weight 10
  </server>
</match>
<source>
  type syslog
  port 42185
  tag {{ inventory_hostname }}.system
</source>

{% for tail in fluentd.tails %}
<source>
  type tail
  format {{ tail.format }}
  time_format {{ tail.time_format }}
  path {{ tail.file }}
  pos_file /var/log/td-agent/pos.{{ tail.name }}
  tag {{ inventory_hostname }}.{{ tail.name }}
</source>
{% endfor %}

At the aggregator’s end, a sample configuration can look like:

<source>
  type forward
  port {{ fluentd_server_port }}
</source>

<match *.**>
  @type elasticsearch
  logstash_format true
  flush_interval 10s
  index_name fluentd
  type_name fluentd
  include_tag_key true
  user {{ es_user }}
  password {{ es_pass }}
</match>

Then the fluentd/td-agent can aggregate all logs from peers and forward to Elasticsearch in LogStash format.

🙂

Distribute cron jobs to hours/minutes with Ansible

This is a handy trick to run a batch of cron jobs on different hour/minute combination so they won’t collide with each other and cause some pressure on the server.

The key is to use `with_indexed_items` and Jinja2 math:

- name: ansible daily cronjob {{ item.1 }}
  cron: user=ansible name=ansible-daily-{{ item.1 }} hour={{ item.0 % 24 }} minute= {{ (item.0 * 5) % 60 }}  job="/usr/local/run.sh {{ item.1  }}"
  with_indexed_items:
    - job_foo
    - job_bar

So the `job_foo` will be running at hour 0, minute 0 and `job_bar` will be running at hour 1, minute 5, etc…

🙂