Container Clustering with Rancher Server (Part 5) – Automating the deployment of AWS infrastructure and Rancher with Terraform

This post was written by Rich Bosomworth.


Discover the acclaimed Team Guides for Software – practical books on operability, business metrics, testability, releasability

This is the fifth post in an on-going series exploring Rancher Server deployment, configuration and extended use. In the last post I detailed how to deploy and manage containerised GoCD continuous delivery using Rancher. In this post we look at automating deployment of Rancher and the supporting AWS components using Terraform. Terraform is an orchestration tool developed by Hashicorp that enables you to build, deploy, manage and update your infrastructure.

Experience with AWS and and an awareness of Terraform is assumed, along with an understanding of all Rancher related systems from previous posts. Content is best viewed from a desktop or laptop.

Series Links

Container Clustering with Rancher Server (Part 1) – Local Server Installation on Linux using Vagrant Host Nodes

Container Clustering with Rancher Server (Part 2) – Single Node Resilience in AWS

Container Clustering with Rancher Server (Part 3) – AWS EFS mounts using Rancher-NFS

Container Clustering with Rancher Server (Part 4) – Deploying and Maintaining Containerised GoCD Continuous Delivery

Container Clustering with Rancher Server (Part 5) – Automating the deployment of AWS infrastructure and Rancher with Terraform

Container Clustering with Rancher Server (Part 6) – Creating and deploying custom catalog items for GoCD

Container Clustering with Rancher Server (Part 7) – Stack and service build out to create a custom catalog item for Splunk


As stated, Terraform is an orchestration tool. Employing orchestration to deploy our AWS infrastructure and Rancher server provides substantial benefits. Aside from speed and conformity of deployment, infrastructure as code can be versioned in Git. A scripted deployment means that we can deploy a version of the infrastructure elsewhere, and/or easily redeploy, rebuild or modify our running environment.

SkeltonThatcher worked with Automation Logic and Rancher to create our own bespoke Terraform deployment for Rancher server and hosts.

The Terraform plan is available from the Skelton Thatcher Git repository.

Plan design

The Terraform plan is based on infrastructure design from Part 2, with expanded functionality using fixed scale to deliver multi-AZ Rancher hosts, along with options for Terraform remote state.

The plan will build out and deploy the following resources:

  • x1 VPC
  • x1 Internet gateway
  • x2 Public subnets
  • x2 Private subnets
  • RDS DB subnet group
  • Single-AZ or Multi-AZ RDS MySQL DB instance
  • Application load balancer + listener + target group
  • Launch configuration + fixed Multi-AZ auto-scaling group of x1 instance for the Rancher server
  • Launch configuration + fixed Multi-AZ auto-scaling group of a specified instance amount for the Rancher hosts
  • RancherOS instance with active Docker running a password protected deployment of the latest version of Rancher server
  • RancherOS instances with active Docker running the latest version of the Rancher host agent
  • Route 53 DNS alias record for the ALB

The plan is designed to be deployed in two stages (Fig:1)

two-stage-rancher-deploy

Fig:1

Why is the deployment a two stage process?

Rancher server is designed to be secure, and thanks to a clever provisioner script written by George Cairns our Terraform plan deploys an already password protected version. Rancher hosts need to register with the server, and in order to do so require a registration token to be generated. Obtaining the token can be instigated quite easily via an API call to an unsecured install, however from a secure version the process is a little more convoluted. As such we find it easier to deploy via a two-stage method. This is a simple process adding no more than ten minutes to the total deployment time. Also, it is generally assumed that once deployed, a Rancher server build state will not be liable to frequent change.

Deployment process

The estimated deployment time is 20-30 minutes.

Prerequisites

  • AWS account
  • AWS IAM user account with AWS access/secret keys and permission to create specified resources
  • Cygwin (or similar) installed to enable running of .sh scripts if using Windows
  • Git installed and configured
  • Terraform installed and configured

Stage One

  • Clone the repo
  • Create an EC2 keypair in AWS
  • Create an S3 bucket to hold remote state
  • Update init.sh with the S3 bucket name and AWS region
  • Run init.sh to initialise remote state
  • Create and terraform.tfvars in the root of the cloned folder (see terraform.tfvars.example)
  • Set hst_max, hst_min and hst_des var entries in terraform.tfvars to zero (Fig:2)
  • Make up a temporary reg_token var entry in terraform.tfvars (Fig:2)
hst-001

Fig:2

  • Run terraform plan from the root of the folder
  • Run terraform apply from the root of the folder
  • Wait until the installation has completed
  • Access Rancher server at the displayed output URL
  • Log in with the name and password specified in the terraform.tfvars file

Stage Two

  • Enable hosts registration from within Rancher and copy the token from the registration string. The token will be in the format similar to 6C8B0D1B2E95DD1AA07A:1483142400000:PKQGzShMCv3wtD02DvlU4MkBY0 and is found under section 5 of ‘Add Hosts‘ in the Rancher server UI (Fig:3).
hosts-token

Fig:3

  • Update hst_max, hst_min and hst_des in terraform.tfvars with the max, min and desired amount of host instances (Fig:4)
  • Update reg_token in terraform.tfvars with the registration token (Fig:4)
hsts-002

Fig:4

  • Re-run terraform plan
  • Re-run terraform apply
  • The launch configuration will be replaced with a new version and applied to the auto scaling group
  • The specified amount of host instances will launch and after a few minutes will appear as registered within the Rancher server UI

Build notes

There is no tagging applied to EC2 launch configuration resources in Terraform. This is because Terraform is unable to destroy and recreate these dynamically using the same name (tag). By letting it (TF) allocate dynamic name tags it can destroy and recreate the launch configuration resources with no issues.

The Terraform plan creates underlying public and private subnets to host all resources, i.e. the ALB, Rancher server and Rancher hosts in public, and the RDS DB in private subnet space. Separation of resources is controlled via EC2 security groups, with RDS creating its own DB subnet group. With a self contained n-tier stack in AWS there is no reason to create individual subnets for each set of resources.

We use RancherOS for both the Rancher server and Rancher hosts. RancherOS is more secure and lightweight than a regular distro such as Ubuntu Linux or CentOS. Our plan creates RancherOS specific userdata scripts to facilitate deployment as part of the EC2 launch configuration process.

Should you require further information or assistance with any aspects of this post, AWS, Terraform, Rancher, or any other services from the website, please feel free to get in touch via the methods as detailed on our contact page.

4 thoughts on “Container Clustering with Rancher Server (Part 5) – Automating the deployment of AWS infrastructure and Rancher with Terraform

  1. Great article, thanks for taking the time to share with the community.

    I’m experiencing a slight issue which i think might be egress related . I can’t access the catalog , getting a “fail” error page.
    When i try to create a service I get “Failed to find network for networkMode managed” when trying to pull an image.

    Like

  2. This 5-part series is great – thanks so much for sharing! Looking at your Terraform Plan for Part 5 – am I correct in finding that you did not implement the EFS backing that you describe in Part 3?

    Like

    1. Hi Saurin, yes, that is correct. The reason was that we were using the stack to deploy GoCD, and at the time of writing the GoCD docker images still used multiple file locations (this has changed with v17.3.0), and as such would have required multiple mounts to provide HA. However, we found that multiple EFS mounts did not work, whereas multiple EBS mounts were fine. Using EBS did bring into play the single-AZ issue, however for us that was no real problem.
      We also liked the way the Rancher EBS plugin created EBS volumes ‘on the fly’ (after first being specified). With EFS (using the NFS plugin as we did) the EFS volume had to be pre-created in AWS.

      If it is of interest, we have recently put together a revised deployment using SSL enabled ELB. This we produced after discovering that Rancher no longer advise the use of an ALB for Rancher server. The new plan also creates and applies an EC2 IAM policy role for Rancher server & hosts, granting full access to EC2, S3, Route 53, SNS & Cloudwatch.

      The revised plan is here – https://github.com/SkeltonThatcher/aws-terraform-rancher-single-node-ha-elb

      Also, if GoCD is of interest too, we have created our own Rancher catalog service items for GoCD (server & agent) using the official GoCD v17.3.0 docker images from Thoughtworks. As with our Rancher plan the deployment is a two stage process. The process is detailed within the service item description and our catalog is here – https://github.com/SkeltonThatcher/rancher-buildeng-catalog

      We would be more than happy to continue this discussion should you require any further information with regard to our processes and production.

      Like

Leave a comment