Container Clustering with Rancher Server (Part 5) – Automating the deployment of AWS infrastructure and Rancher with Terraform

This post was written by Rich Bosomworth.

Discover the acclaimed **Team Guides for Software** – practical books on operability, business metrics, testability, releasability

This is the fifth post in an on-going series exploring Rancher Server deployment, configuration and extended use. In the last post I detailed how to deploy and manage containerised GoCD continuous delivery using Rancher. In this post we look at automating deployment of Rancher and the supporting AWS components using Terraform. Terraform is an orchestration tool developed by Hashicorp that enables you to build, deploy, manage and update your infrastructure.

Experience with AWS and and an awareness of Terraform is assumed, along with an understanding of all Rancher related systems from previous posts. Content is best viewed from a desktop or laptop.

Series Links

Container Clustering with Rancher Server (Part 1) – Local Server Installation on Linux using Vagrant Host Nodes

Container Clustering with Rancher Server (Part 2) – Single Node Resilience in AWS

Container Clustering with Rancher Server (Part 3) – AWS EFS mounts using Rancher-NFS

Container Clustering with Rancher Server (Part 4) – Deploying and Maintaining Containerised GoCD Continuous Delivery

Container Clustering with Rancher Server (Part 5) – Automating the deployment of AWS infrastructure and Rancher with Terraform

Container Clustering with Rancher Server (Part 6) – Creating and deploying custom catalog items for GoCD

Container Clustering with Rancher Server (Part 7) – Stack and service build out to create a custom catalog item for Splunk

As stated, Terraform is an orchestration tool. Employing orchestration to deploy our AWS infrastructure and Rancher server provides substantial benefits. Aside from speed and conformity of deployment, infrastructure as code can be versioned in Git. A scripted deployment means that we can deploy a version of the infrastructure elsewhere, and/or easily redeploy, rebuild or modify our running environment.

SkeltonThatcher worked with Automation Logic and Rancher to create our own bespoke Terraform deployment for Rancher server and hosts.

The Terraform plan is available from the Skelton Thatcher Git repository.

Plan design

The Terraform plan is based on infrastructure design from Part 2, with expanded functionality using fixed scale to deliver multi-AZ Rancher hosts, along with options for Terraform remote state.

The plan will build out and deploy the following resources:

x1 VPC
x1 Internet gateway
x2 Public subnets
x2 Private subnets
RDS DB subnet group
Single-AZ or Multi-AZ RDS MySQL DB instance
Application load balancer + listener + target group
Launch configuration + fixed Multi-AZ auto-scaling group of x1 instance for the Rancher server
Launch configuration + fixed Multi-AZ auto-scaling group of a specified instance amount for the Rancher hosts
RancherOS instance with active Docker running a password protected deployment of the latest version of Rancher server
RancherOS instances with active Docker running the latest version of the Rancher host agent
Route 53 DNS alias record for the ALB

The plan is designed to be deployed in two stages (Fig:1)

Fig:1

Why is the deployment a two stage process?

Rancher server is designed to be secure, and thanks to a clever provisioner script written by George Cairns our Terraform plan deploys an already password protected version. Rancher hosts need to register with the server, and in order to do so require a registration token to be generated. Obtaining the token can be instigated quite easily via an API call to an unsecured install, however from a secure version the process is a little more convoluted. As such we find it easier to deploy via a two-stage method. This is a simple process adding no more than ten minutes to the total deployment time. Also, it is generally assumed that once deployed, a Rancher server build state will not be liable to frequent change.

Deployment process

The estimated deployment time is 20-30 minutes.

Prerequisites

AWS account
AWS IAM user account with AWS access/secret keys and permission to create specified resources
Cygwin (or similar) installed to enable running of .sh scripts if using Windows
Git installed and configured
Terraform installed and configured

Stage One

Clone the repo
Create an EC2 keypair in AWS
Create an S3 bucket to hold remote state
Update init.sh with the S3 bucket name and AWS region
Run init.sh to initialise remote state
Create and terraform.tfvars in the root of the cloned folder (see terraform.tfvars.example)
Set hst_max, hst_min and hst_des var entries in terraform.tfvars to zero (Fig:2)
Make up a temporary reg_token var entry in terraform.tfvars (Fig:2)

Fig:2

Run terraform plan from the root of the folder
Run terraform apply from the root of the folder
Wait until the installation has completed
Access Rancher server at the displayed output URL
Log in with the name and password specified in the terraform.tfvars file

Stage Two

Enable hosts registration from within Rancher and copy the token from the registration string. The token will be in the format similar to 6C8B0D1B2E95DD1AA07A:1483142400000:PKQGzShMCv3wtD02DvlU4MkBY0 and is found under section 5 of ‘Add Hosts‘ in the Rancher server UI (Fig:3).

Fig:3

Update hst_max, hst_min and hst_des in terraform.tfvars with the max, min and desired amount of host instances (Fig:4)
Update reg_token in terraform.tfvars with the registration token (Fig:4)

Fig:4

Re-run terraform plan
Re-run terraform apply
The launch configuration will be replaced with a new version and applied to the auto scaling group
The specified amount of host instances will launch and after a few minutes will appear as registered within the Rancher server UI

Build notes

There is no tagging applied to EC2 launch configuration resources in Terraform. This is because Terraform is unable to destroy and recreate these dynamically using the same name (tag). By letting it (TF) allocate dynamic name tags it can destroy and recreate the launch configuration resources with no issues.

The Terraform plan creates underlying public and private subnets to host all resources, i.e. the ALB, Rancher server and Rancher hosts in public, and the RDS DB in private subnet space. Separation of resources is controlled via EC2 security groups, with RDS creating its own DB subnet group. With a self contained n-tier stack in AWS there is no reason to create individual subnets for each set of resources.

We use RancherOS for both the Rancher server and Rancher hosts. RancherOS is more secure and lightweight than a regular distro such as Ubuntu Linux or CentOS. Our plan creates RancherOS specific userdata scripts to facilitate deployment as part of the EC2 launch configuration process.

Should you require further information or assistance with any aspects of this post, AWS, Terraform, Rancher, or any other services from the website, please feel free to get in touch via the methods as detailed on our contact page.

Hi Saurin, yes, that is correct. The reason was that we were using the stack to deploy GoCD, and at the time of writing the GoCD docker images still used multiple file locations (this has changed with v17.3.0), and as such would have required multiple mounts to provide HA. However, we found that multiple EFS mounts did not work, whereas multiple EBS mounts were fine. Using EBS did bring into play the single-AZ issue, however for us that was no real problem.
We also liked the way the Rancher EBS plugin created EBS volumes ‘on the fly’ (after first being specified). With EFS (using the NFS plugin as we did) the EFS volume had to be pre-created in AWS.

If it is of interest, we have recently put together a revised deployment using SSL enabled ELB. This we produced after discovering that Rancher no longer advise the use of an ALB for Rancher server. The new plan also creates and applies an EC2 IAM policy role for Rancher server & hosts, granting full access to EC2, S3, Route 53, SNS & Cloudwatch.

The revised plan is here – https://github.com/SkeltonThatcher/aws-terraform-rancher-single-node-ha-elb

Also, if GoCD is of interest too, we have created our own Rancher catalog service items for GoCD (server & agent) using the official GoCD v17.3.0 docker images from Thoughtworks. As with our Rancher plan the deployment is a two stage process. The process is detailed within the service item description and our catalog is here – https://github.com/SkeltonThatcher/rancher-buildeng-catalog

We would be more than happy to continue this discussion should you require any further information with regard to our processes and production.

LikeLike

4 thoughts on “Container Clustering with Rancher Server (Part 5) – Automating the deployment of AWS infrastructure and Rancher with Terraform”

Paul Mortimer says:

2017-04-01 at 08:24

Great article, thanks for taking the time to share with the community.

I’m experiencing a slight issue which i think might be egress related . I can’t access the catalog , getting a “fail” error page.
When i try to create a service I get “Failed to find network for networkMode managed” when trying to pull an image.

LikeLike

Rich Bos says:

2017-04-05 at 08:56

Hi Paul, we have verified that this is an issue with Rancher v1.5+ and have updated our repo accordingly for a fixed deployment of v1.4.2.

We have also logged a Git issue with Rancher themselves and will hope to update fully once resolved – https://github.com/rancher/rancher/issues/8372

LikeLike

Saurin Patel says:

2017-04-15 at 20:43

This 5-part series is great – thanks so much for sharing! Looking at your Terraform Plan for Part 5 – am I correct in finding that you did not implement the EFS backing that you describe in Part 3?

LikeLike

1. Rich Bos says:
  
  2017-04-20 at 13:26
  
  Hi Saurin, yes, that is correct. The reason was that we were using the stack to deploy GoCD, and at the time of writing the GoCD docker images still used multiple file locations (this has changed with v17.3.0), and as such would have required multiple mounts to provide HA. However, we found that multiple EFS mounts did not work, whereas multiple EBS mounts were fine. Using EBS did bring into play the single-AZ issue, however for us that was no real problem.
  We also liked the way the Rancher EBS plugin created EBS volumes ‘on the fly’ (after first being specified). With EFS (using the NFS plugin as we did) the EFS volume had to be pre-created in AWS.
  
  If it is of interest, we have recently put together a revised deployment using SSL enabled ELB. This we produced after discovering that Rancher no longer advise the use of an ALB for Rancher server. The new plan also creates and applies an EC2 IAM policy role for Rancher server & hosts, granting full access to EC2, S3, Route 53, SNS & Cloudwatch.
  
  The revised plan is here – https://github.com/SkeltonThatcher/aws-terraform-rancher-single-node-ha-elb
  
  Also, if GoCD is of interest too, we have created our own Rancher catalog service items for GoCD (server & agent) using the official GoCD v17.3.0 docker images from Thoughtworks. As with our Rancher plan the deployment is a two stage process. The process is detailed within the service item description and our catalog is here – https://github.com/SkeltonThatcher/rancher-buildeng-catalog
  
  We would be more than happy to continue this discussion should you require any further information with regard to our processes and production.
  
  LikeLike