Ansible, CloudFormation and Jinja2

CloudFormation is the corner stone to provision infrastructure in AWS with code, however it’s not very DRY, ie. poor modularization, almost static variables and templates. So here comes Ansible.

However at the moment Ansible’s CloudFormation module doesn’t support Jinja2 in templates, like other modules do. Luckily there’s a work-around to get the Ansible-CloudFormation-Jinja2 trio working together.

A simple CloudFormation snippet with Jinja2 variable:

# roles/test-stack/templates/mycf.yaml.j2
Type: String
Default: '{{ template_params.vpc_id }}'

I don’t put Jinja2 variable directly to replace the !Ref intrinsic function, because I think this is a softer approach so if there’s any parameter verification in the template it still works.

Then the template can be loaded to Ansible’s cloudformation module like this:

# roles/test-stack/tasks/main.yaml
- name: create a stack
stack_name: '{{ stack_names.test_stack }}'
state: present
template_body: "{{ lookup('template', 'templates/mycf.yaml.j2') }}"
tags: '{{ template_tags }}'

The inventory may look like this:

# inventory/local
ansible_connection: local
gather_facts: false

All parameters can go into the global variable file group_vars/all or host variable file host_vars/dev so later I can create a set of parameters for production.

# group_vars/all
test_stack: test1
# host_vars/dev
vpc_id: vpc-xxxxxxxxxxxxxxx

app_env: dev

And the playbook can be simple as:

# deploy.yaml
- name: deploy stacks
hosts: dev
tags: dev
- test-stack

Finally, this CloudFormation stack can be deployed by running:

$ ansible-playbook -i inventory/local deploy.yaml --tags dev


Working with a Big Corporation

So it’s been a while since I started this job in a big corporation. I always enjoy new challenges, now my wish got granted. Not in a very good way.

The things work in a quite different manner here. There are big silos and layers between teams and departments, so the challenges here are not quite technical in nature. How unexpected this is.

Still there are lots of things can be improved with technology, here’s one example. When I was migrating an old web application stack from on-premises infrastructure to AWS, the AWS landing zone has already been provisioned with a duo-VPC setup. I really really miss the days that working with Kubernetes clusters and I can just run kubectl exec -ti ... and get a terminal session quickly.

Now things look like year 2000 and I need to use SSH proxy command again, without old school static IP addresses though. Ansible dynamic inventory is quite handy in most cases but it failed due to some unknown corporate firewall rules. I still have bash, aws-cli and jq, so this is my handy bash script to connect to 1 instance of an auto scaling group, via a bastion host(they both can be rebuilt and change IP).

function get_stack_ip(){
aws ec2 describe-instances \
--fileter "Name=tag-key,Values=aws:cloudformation:stack-name" "Name=tag-value,Values=$1" \
|jq '.Reservations[] |select(.Instance[0].PrivateIpAddress != null).Instance[0].PrivateIpAddress' \
|tr -d '"'

Then it’s easy to use this function to get IPs of the bastion stack and the target stack, such as:

IP_BASTION=$(get_stack_ip bastion_stack)
IP_TARGET=$(get_stack_ip target_stack)
ssh -o ProxyCommand="ssh [email protected]_BASTION nc %h %p" [email protected]_TARGET


Nicer Deployment with Kubernetes

The default strategy to do rolling update in a Kubernetes deployment is to reduce the capacity of current replica set and then add the capacity to the new replica set. This probably means total processing power for the app could be hindered a bit during the deployment.

I’m a bit surprised to find that the default strategy works this way. But luckily it’s not hard to fine tune this. According to the doc here: only a few lines is needed to change the strategy:

apiVersion: extensions/v1beta1
kind: Deployment
name: my-deploy
namespace: my-project
maxUnavailable: 0
maxSurge: 40%
revisionHistoryLimit: 3

By maxUnavailable: 0 this means the total capacity of the deployment will not be reduced. and maxSurge: 40% means new replica can reach 40% of the capacity of the current replica set, before the current one become the old one and drained.

Not a big improvement but revisionHistoryLimit: 3 will only keep 3 replica sets for purpose to roll back a deployment. The default for this is unlimited, which is quite over provisioned, from my point of view.


Ansible, Packer and Configurations

I use Ansible as provisioner for Packer, to build AMIs to be used as a base image of our development environment. When Ansible is used by Packer, it’s not quite intuitive whether it’s using the same ansible.cfg when I run ansible-playbook command in a terminal.

Here’s how to make sure Ansible in Packer session will use the correct ansible.cfg file. 

First, an ENV is supplied in Packer’s template, because ENV precedes any other configuration that can be found:

      "type": "ansible",
      "playbook_file": "../../ansible/apps.yml",
      "ansible_env_vars": ["ANSIBLE_CONFIG=/tmp/ansible.cfg"],
      "extra_arguments": [

The line with “ANSIBLE_CONFIG=/tmp/ansible.cfg” will tell ansible to use /tmp/ansible.cfg.

With the ansible.cfg at /tmp, and the extra debug switch -vvv I can see in the output if the config file is picked up.