Tuesday 22 March 2016

vCloud Director and Terraform

On 8th January 2016, Terraform v0.6.9 was released and amongst the new features was the addition of a +VMware  vCloud Director provider. This was the result of a community contribution to the open source code base from some smart guys at +OpenCredo  working for +HMRC . They had not only forked VMware's own govcloudair library creating the govcd library and made it work with the vCloud Director API, but had written a new Terraform provider utilising the new govcd library to manage and manipulate a number of resources in the vCloud ecosystem.

So following on from my look at Chef Provisioning, I thought I would take Terraform for a spin and see what it could do.

Terraform is purely an infrastructure orchestration tool. Sure it is cross-platform and multi-cloud-provider aware, but it makes no claims to be a Configuration Management tool, and it fact actively supports Chef as a provisioning tool which it happily hands over control to once the VMs have booted. Terraform is another tool from +HashiCorp , the same people that brought us Vagrant and Packer, and the quality of documentation on their website is first class. It is supported with downloadable binaries for Linux, Windows and Macs. The download is a ZIP file, so your first task is to unzip the contents somewhere suitable and then make sure the directory is added to your PATH environment variable so that the executables can be found. Terraform is a command-line only tool - there is no UI. It reads all files in the current directory (unless given an alternative directory to scan) with a *.tf file extension for it's configuration.

First thing to do then is to connect up to +Skyscape Cloud Services vCloud Director's API. Just like in Packer's configuration file, Terraform supports user variables in the configuration files, allowing the values to either be passed on the command line, or read from underlying environment variables. So as not to hard-code our user credential, lets start by defining a couple of variable and then use them it the vCloud Director credentials.

variable "vcd_org" {}
variable "vcd_userid" {}
variable "vcd_pass" {}

# Configure the VMware vCloud Director Provider
provider "vcd" {
    user            = "${var.vcd_userid}"
    org             = "${var.vcd_org}"
    password        = "${var.vcd_pass}"
    url      = "https://api.vcd.portal.skyscapecloud.com/api"
}

The values of these three variables can either be set by including on the Terraform command line --var vcd_org=1-2-33-456789 or you can create an environment variable of the same name but prefixed with TF_VAR_ - for example: SET TF_VAR_vcd_org=1-2-33-456789
Note that the environment variable name is case sensitive.

First infrastructure component that we'll want to create is a routed vDC Network on which we will create our VMs. This is supported using the "vcd_network" resource description.

variable "edge_gateway" { default = "Edge Gateway Name" }
variable "mgt_net_cidr" { default = "10.10.0.0/24" }

# Create our networks
resource "vcd_network" "mgt_net" {
    name = "Management Network"
    edge_gateway = "${var.edge_gateway}"
    gateway = "${cidrhost(var.mgt_net_cidr, 1)}"
    static_ip_pool {
        start_address = "${cidrhost(var.mgt_net_cidr, 10)}"
        end_address = "${cidrhost(var.mgt_net_cidr, 200)}"
    }
}
Here we need to specify the name of the vShield Edge allocated to our vDC, giving it an IP address on our new network. We are also specifying an address range to use for our static IP Address pool. The resource description also supports setting up a DHCP pool, but more on this in a bit.

We are adding some intelligence into our configuration by making use of one of Terraform's built-in function called cidrhost() instead of hard-coding different IP addresses. This allow us to use a variable to pass in the CIDR subnet notation for the subnet to be used on this network, and from that subnet allocate the first address in the range for the gateway on the vShield Edge, and  from the 10th to 200th address as our static pool.

Seems very straight forward so far. No how about spinning up a VM on our new subnet ? Well, similar to Fog's abstraction that we looked at with Chef Provisioning in my last post, Terraform currently only supports the creation of a single VM per vApp.

variable "catalog"        { default = "DevOps" }
variable "vapp_template"  { default = "centos71" }
variable "jumpbox_int_ip" { default = "10.10.0.100" }

# Jumpbox VM on the Management Network
resource "vcd_vapp" "jumpbox" {
    name          = "jump01"
    catalog_name  = "${var.catalog}"
    template_name = "${var.vapp_template}"
    memory        = 512
    cpus          = 1
    network_name  = "${vcd_network.mgt_net.name}"
    ip            = "${var.jumpbox_int_ip}"
}

Notice here that we can cross-reference the network we have just created by referencing ${resource_type.resource_id.resource_property} - where mgt_net is the Terraform identifier we gave to our Network. By doing this it adds an implicit dependency into our configuration, ensuring that the network gets created first, before then creating the VM. More on Terraform's dependencies and how you can represent them graphically later. The IP address assigned to the VM has to be within the range allocated to the static pool for the network or you will get an error. For the catalog and vApp Template, we will use the centos71 template in our DevOps catalog that we created in my earlier blog post.

All very painless so far. Now to get access to our jumpbox server from the outside world. We need to open up the firewall rules on the vShield Edge to allow incoming SSH, and setup DNAT rules to forward to connection to our new VM.

variable "jumpbox_ext_ip" {}

# Inbound SSH to the Jumpbox server
resource "vcd_dnat" "jumpbox-ssh" {
    edge_gateway  = "${var.edge_gateway}"
    external_ip   = "${var.jumpbox_ext_ip}"
    port          = 22
    internal_ip   = "${var.jumpbox_int_ip}"
}

# SNAT Outbound traffic
resource "vcd_snat" "mgt-outbound" {
    edge_gateway  = "${var.edge_gateway}"
    external_ip   = "${var.jumpbox_ext_ip}"
    internal_ip   = "${var.mgt_net_cidr}"
}

resource "vcd_firewall_rules" "mgt-fw" {
    edge_gateway   = "${var.edge_gateway}"
    default_action = "drop"

    rule {
        description      = "allow-jumpbox-ssh"
        policy           = "allow"
        protocol         = "tcp"
        destination_port = "22"
        destination_ip   = "${var.jumpbox_ext_ip}"
        source_port      = "any"
        source_ip        = "any"
    }

    rule {
        description      = "allow-mgt-outbound"
        policy           = "allow"
        protocol         = "any"
        destination_port = "any"
        destination_ip   = "any"
        source_port      = "any"
        source_ip        = "${var.mgt_net_cidr}"
    }
}

And that should be enough to get our first VM up and be able to SSH into it. Now to give it a go. Terraform has a dry-run mode where it evaluates your configuration files, compares it to the current state and shows you what actions would be taken - useful if you need to document what changes need to be made in change request documentation.

> terraform plan --var jumpbox_ext_ip=aaa.bbb.ccc.ddd
Refreshing Terraform state prior to plan...


The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ vcd_dnat.jumpbox-ssh
    edge_gateway: "" => "Edge Gateway Name"
    external_ip:  "" => "aaa.bbb.ccc.ddd"
    internal_ip:  "" => "10.10.0.100"
    port:         "" => "22"

+ vcd_firewall_rules.website-fw
    default_action:          "" => "drop"
    edge_gateway:            "" => "Edge Gateway Name"
    rule.#:                  "" => "2"
    rule.0.description:      "" => "allow-jumpbox-ssh"
    rule.0.destination_ip:   "" => "aaa.bbb.ccc.ddd"
    rule.0.destination_port: "" => "22"
    rule.0.id:               "" => ""
    rule.0.policy:           "" => "allow"
    rule.0.protocol:         "" => "tcp"
    rule.0.source_ip:        "" => "any"
    rule.0.source_port:      "" => "any"
    rule.1.description:      "" => "allow-outbound"
    rule.1.destination_ip:   "" => "any"
    rule.1.destination_port: "" => "any"
    rule.1.id:               "" => ""
    rule.1.policy:           "" => "allow"
    rule.1.protocol:         "" => "any"
    rule.1.source_ip:        "" => "10.10.0.0/24"
    rule.1.source_port:      "" => "any"

+ vcd_network.mgt_net
    dns1:                                   "" => "8.8.8.8"
    dns2:                                   "" => "8.8.4.4"
    edge_gateway:                           "" => "Edge Gateway Name"
    fence_mode:                             "" => "natRouted"
    gateway:                                "" => "10.10.0.1"
    href:                                   "" => ""
    name:                                   "" => "Management Network"
    netmask:                                "" => "255.255.255.0"
    static_ip_pool.#:                       "" => "1"
    static_ip_pool.241192955.end_address:   "" => "10.10.0.200"
    static_ip_pool.241192955.start_address: "" => "10.10.0.10"

+ vcd_snat.mgt-outbound
    edge_gateway: "" => "Edge Gateway Name"
    external_ip:  "" => "aaa.bbb.ccc.ddd"
    internal_ip:  "" => "10.10.0.0/24"

+ vcd_vapp.jumpbox
    catalog_name:  "" => "DevOps"
    cpus:          "" => "1"
    href:          "" => ""
    ip:            "" => "10.10.0.100"
    memory:        "" => "512"
    name:          "" => "jump01"
    network_name:  "" => "Management Network"
    power_on:      "" => "1"
    template_name: "" => "centos71"


Plan: 5 to add, 0 to change, 0 to destroy.

To go ahead and start the deployment, change the command line to "terraform apply" instead of "terraform plan". At this point, Terraform will go away and start making calls to the vCloud Director API to build out your infrastructure. Since there is an implicit dependency between the jumpbox VM and the mgt_net network it is attached to, Terraform will wait until the network has been created before launching the VM. No other dependencies have been applied at this point, so Terraform will apply the DNAT / SNAT / firewall rules concurrently.

With the "terraform apply" command complete, we now have a new VM running in our VDC on its own network complete with firewall and NAT rules. Lets not stop there..... Although Terraform is not a configuration management tool, it can bootstrap and run a configuration management tool on the new VM after it has booted. It has built in support for bootstrapping Chef, and also a generic file transfer and remote-exec provisioner allowing you to run any script you like to bootstap your tool of choice. Lets take a look at using Chef to configure our new VM.

The Chef Provisioner can be added as part of the vcd_vapp resource definition, in which case before the resource is marked complete it will sequentially run the chef-client command on the VM, using either an SSH connection by default or WINRM if connecting to a windows server, and waiting for the results of the chef converge.

resource "vcd_vapp" "jumpbox" {
    name          = "jumpbox01"
    catalog_name  = "${var.catalog}"
    template_name = "${var.vapp_template}"
    memory        = 512
    cpus          = 1
    network_name  = "${vcd_network.mgt_net.name}"
    ip            = "${var.jumpbox_int_ip}"

    depends_on    = [ "vcd_dnat.jumpbox-ssh", "vcd_firewall_rules.website-fw", "vcd_snat.website-outbound" ]

    connection {
        host = "${var.jumpbox_ext_ip}"
        user = "${var.ssh_user}"
        password = "${var.ssh_password}"
    }

    provisioner "chef"  {
        run_list = ["chef-client","chef-client::config","chef-client::delete_validation"]
        node_name = "${vcd_vapp.jumpbox.name}"
        server_url = "https://api.chef.io/organizations/${var.chef_organisation}"
        validation_client_name = "skyscapecloud-validator"
        validation_key = "${file("~/.chef/skyscapecloud-validator.pem")}"
        version = "${var.chef_client_version}"
    }
}

Alternatively, to give you more control over where in the resource dependency tree the provisioning steps are run, there is a special null_resource that can be used with the provisioner configurations.

resource "null_resource" "jumpbox" {
    depends_on = [ "vcd_vapp.jumpbox" ]
    connection {
        host = "${var.jumpbox_ext_ip}"
        user = "${var.ssh_user}"
        password = "${var.ssh_password}"
    }

    provisioner "chef"  {
        run_list = ["chef-client","chef-client::config","chef-client::delete_validation"]
        node_name = "${vcd_vapp.jumpbox.name}"
        server_url = "https://api.chef.io/organizations/${var.chef_organisation}"
        validation_client_name = "skyscapecloud-validator"
        validation_key = "${file("~/.chef/skyscapecloud-validator.pem")}"
        version = "${var.chef_client_version}"
    }
}

Now we have our jumpbox server built and configured, we can start adding in all the rest of our infrastructure. One really neat feature that Terraform has is the 'count' property on resources. This is useful when building out a cluster of identical servers, such as a web server farm. You can define a single vcd_vapp resource and set the count property to the size of your webfarm. Better still, if you use a variable for the value of the count property, you can create your own 'basic' auto-scaling functionality, taking advantage of Terraform's idempotency to create additional webserver instances when running 'terraform apply --var webserver_count=<bigger number>' to scale up, or to scale-down your webfarm by running 'terraform apply --var webserver_count=<smaller number>'.

Here is how I define my group of webservers that will sit behind the load-balancer.

resource "vcd_vapp" "webservers" {
    name          = "${format("web%02d", count.index + 1)}"
    catalog_name  = "${var.catalog}"
    template_name = "${var.vapp_template}"
    memory        = 1024
    cpus          = 1
    network_name  = "${vcd_network.web_net.name}"
    ip            = "${cidrhost(var.web_net_cidr, count.index + 100)}"

    count         = "${var.webserver_count}"

    depends_on    = [ "vcd_vapp.jumpbox", "vcd_snat.website-outbound" ]

    connection {
        bastion_host = "${var.jumpbox_ext_ip}"
        bastion_user = "${var.ssh_user}"
        bastion_password = "${var.ssh_password}"

        host = "${cidrhost(var.web_net_cidr, count.index + 100)}"
        user = "${var.ssh_user}"
        password = "${var.ssh_password}"
    }

    provisioner "chef"  {
        run_list = [ "chef-client", "chef-client::config", "chef-client::delete_validation", "my_web_app" ]
        node_name = "${format("web%02d", count.index + 1)}"
        server_url = "https://api.chef.io/organizations/${var.chef_organisation}"
        validation_client_name = "${var.chef_organisation}-validator"
        validation_key = "${file("~/.chef/${var.chef_organisation}-validator.pem")}"
        version = "${var.chef_client_version}"
        attributes {
            "tags" = [ "webserver" ]
        }
    }            
}

Note how we can use "${format("web%02d", count.index + 1)}" to dynamically create the VM name based on the incrementing count, as well as more use of the cidrhost() function to calculate the VM's IP Address. The counter starts at zero, so we add 1 to the count index to give us names 'web01', 'web02', etc. We can also use the jumpbox server's external address for the bastion_host in the connection details and Terraform will ssh relay through the jumpbox server to reach the internal address of the webservers.

The full configuration for Terraform can be downloaded from Skyscape's GitHub repository. In addition to the jumpbox and webservers covered above, it also includes a database server and a load-balancer, along with sample Chef configuration to register against a Chef Server and deploy a set of cookbooks to configure the servers. I won't go into the Chef Server configuration here, but take a look at the README.md file on GitHub for some pointers.

Gotchas ?


The only issue I have with Terraform so far is that it does not make the call to VMWare's Guest Customisation to set the VM hostname to match the vApp name and the VM name as displayed in vCloud Director. When the VMs boot up, they have the same hostname as the VM that was cloned from the vApp Template used. This is not a big issue, since it is trivial to use Chef or any other configuration management tool to set the hostname as required.

In Summary

It rocked. Having abandoned plans to use the vShield Edge load-balancer capability in favour of our own Haproxy installation, the Terraform configuration in GitHub will create a complete 3-Tier Web Application from scratch, complete with network isolation across multiple networks, a load-balancer, webserver farm and back-end database server. Chef provisions all the software and server configuration needed to just point my web browser at the external IP address at the end of the Terraform run and have it start serving my web pages.

Let's check off my test criteria:

CriteriaPass / FailComments
Website hosted in a vDCvDC / Networks / NAT / Firewall rules all managed by Terraform.
Deploy 2 Webservers behind a load-balancerThe webservers were created and web app deployed. The Haproxy load-balancer was dynamically configured to use all the webservers in the farm, independent of the webserver_count value used to size the webfarm.
Deploy a database serverThe database server was deployed correctly, and the db config was inserted into the webservers allowing them to connect.
Deploy a jumpbox serverThe server was deployed correctly and all the NAT rules were managed by Terraform.


Overall then, a resounding success. The Terraform vCloud Director provider is still very young, and is not as feature rich as some of the other providers for other cloud vendors. That said, it is an open source project and the developers from OpenCredo are happy to collaborate on new features or accept pull requests from other contributors. I am already pulling together a list of feature enhancements that I plan to contribute to the code base.

Terraform also has many more features than I've got time to mention in this post. In addition to VM provisioning, it also has several providers to support various resources from DNS records to SQL databases. It also has a concept of code re-use in the form of modules, and storing it's state information on remote servers to allow teams of users to collaborate on the same infrastructure. 

For end-to-end integration into your deploy pipeline, it also combines with Hashicorp's Atlas cloud service, giving you a complete framework and UI for linking your GitHub repository together with automated 'plans' when raising pull requests to validate configuration changes, feeding back the results to the PR, and an authorisation framework for automatically deploying merged changes.

Tuesday 9 February 2016

NetworkManager and VMware Customisations Conflict With Each Other

In my last post review Chef Provisioning's Fog driver, I mentioned that I experienced a number of intermittent errors where it appeared that the VMware customisation on the VM had failed to setup the network interface correctly. At the time I had not established what the problem was or how to overcome it.

I believe that after spending some time trawling through log messages and getting nowhere, I inadvertently stumbled on the same problem from a different direction. I was manually configuring the networking to use a static IP address on a VM I had created from my centos71 vApp template. All was well initially, but all of a sudden the network connectivity was lost again. What was going on ?

Well as it turns out, the NetworkManager service was overwriting the ifcfg-ens32 configuration setting the interface back to DHCP. In using packer to create the vApp template, I had forgotten to disable the NetworkManager service which is enabled by default in CentOS7, While it is possible on a per-interface basis to set NM_CONTROLLED=NO in the respective ifcfg-xxx file, it seems that the VMware customisation process does not set that option, and as a result NetworkManager notices the changed configuration and resets it back to what it believes to be correct.

I have updated the packer scripts in https://github.com/skyscape-cloud-services/automation_examples so it ensures that the NetworkManager service is disabled in the vApp Template. Having recreated my centos71 vApp Template used in the subsequent blog posts, the intermittent problems I was experiencing seem to have gone away.

Wednesday 13 January 2016

vCloud Director and Chef Provisioning (Fog Driver)

Inspired by one of the presentations at Chef Conf 2015 showcasing the use of Chef Provisioning to provision servers on VMWare's vCloud Air service, I thought that I would start my evaluation of vCloud Director API tools by replicating the demonstration on +Skyscape Cloud Services infrastructure. I have a lot of experience using Chef Provisioning with the AWS driver, so how hard could it be...


The first thing I needed to do was download the Chef Development Kit (ChefDK) from https://downloads.chef.io/chef-dk/ - it can be installed on Windows, Mac or Linux platforms.

The Chef Development kit is a handy bundle of all the software components you need to develop and deploy chef cookbooks and recipes. It includes a bundled Ruby interpreter, the Chef Provisioning gem and the Fog driver for Chef Provisioning.

> chef --version
Chef Development Kit Version: 0.10.0
chef-client version: 12.5.1
berks version: 4.0.1
kitchen version: 1.4.2

> chef gem list chef-provisioning

*** LOCAL GEMS ***

chef-provisioning (1.5.0)
chef-provisioning-aws (1.6.1)
chef-provisioning-azure (0.4.0)
chef-provisioning-fog (0.15.0)
chef-provisioning-vagrant (0.10.0)

To get started you will want to clone the Automation Examples github repository to your local workstation. All the files used in this demo are in the chef-provisioning-fog sub-directory. From this point on, assume all commands are run from within the chef-provisioning-fog sub-directory. The Chef tools read their configuration from the knife.rb file in this directory. To keep the file fairly generic, it reads a number of user specific values from environment variable.

You will need to setup the following environment variables with values for your Skyscape Portal account - these can be found at https://portal.skyscapecloud.com/user/api :

VCAIR_ORG=1-2-33-456789
VCAIR_USERNAME=1234.5.67890
VCAIR_PASSWORD=Secret
JUMPBOX_IPADDRESS=XXX.XXX.XXX.XXX
VCAIR_SSH_USERNAME=vagrant
VCAIR_SSH_PASSWORD=vagrant

The last two variables are the credentials to use when logging into new VMs created from your vApp Template.

Optionally, you could also install the knife-vcair plugin, allowing you to use Chef's standard CLI tool to interact with vCloud Director. I will make use of some of the commands in this post, so to install it, run:
chef gem install knife-vcair

To confirm that your configuration and environment variables are correct, you should be able to run the following command:
knife vcair image list

It should return a long list of available vApp Templates from the Public Catalogs, and if you have followed my previous post Creating vApp Templates for Automation, you should also see in the list the centos71 template we uploaded.

The First Snag

The Chef Provisioning Fog driver has no facility to create or manage any components other than vApps / VMs. It cannot create vDC Networks, manage the Static IP Pool allocated to that network, or manage the vShield Edge properties to setup SNAT/DNAT rules, firewall rules or Load-Balancer configurations.

Given these restrictions, out-of-the-box we also cannot determine the Public IP address assigned to the Jumpbox Server, which is why we have to explicitly set it in the environment variables above.

Manual One-Time Setup

Before we can continue using Chef Provisioning to create our VMs, we need to go through a few set-up steps and configure the vDC environment so that it is ready to use.

First of all, we'll need to create a vDC Network for the VMs to use. We'll create a routed network for 10.1.1.0/24 with a Gateway address 10.1.1.1 and use a Primary DNS of 8.8.8.8. We'll add a Static IP pool for that network of 10.1.1.10-10.1.1.100.
The same information  can also be seen by running:

$ knife vcair network list
Name             Gateway   IP Range Start  End         Description
Jumpbox Network  10.1.1.1  10.1.1.10       10.1.1.100  Demo network for API automation examples

Since we know that the first VM created on the new network will be allocated the ip address 10.1.1.10, we can pre-create the necessary DNAT rule to allow inbound SSH access to the jumpbox. We'll also add SNAT rules to allow the created VMs to connect out to the internet as needed.
These rules are then complemented by associated firewall rules:

On With The Cooking

So with all the pre-reqs and setup done, let's get on and look at the Chef Provisioning recipe we are going to use to create our Web Application's infrastructure. Chef's recipes are Ruby scripts enhanced with Domain Specific Language definitions (DSL). The Chef DSL allows you to write in a declarative style, a list of resources to define your desired end state of your infrastructure.

Traditionally these resources would describe the setup of a single server, declaring what packages should be installed, templating the contents of configuration files, or defining local firewall rules and so on. Chef Provisioning extends the DSL to allow you to define servers themselves as a resource. All you need to include in your recipes to make use of these extensions is:

require 'chef/provisioning'

with_driver 'fog:Vcair'

Being a ruby script, we can define a number of variable in the script for simplicity and re-use, and we can pull in values from environment variables:
num_webservers = 2

vcair_opts = {
  bootstrap_options: {
    image_name: 'centos71',
    net: 'Jumpbox Network',
    memory: '512',
    cpus: '1',
    ssh_options: {
      password: ENV['VCAIR_SSH_PASSWORD'],
      user_known_hosts_file: '/dev/null',
    }
  },
  create_timeout: 600,
  start_timeout: 600,
  ssh_gateway: "#{ENV['VCAIR_SSH_USERNAME']}@#{ENV['JUMPBOX_IPADDRESS']}",
  ssh_options: { 
    :password => ENV['VCAIR_SSH_PASSWORD'],
  }
}

Here we are specifying our uploaded vApp Template name - centos71 - that we will use when creating new VMs, and the name of the vDC Network we created above. We also specify the use of an SSH Gateway, making use of the Jumpbox's Public IP Address as a relay for connecting to the other servers in the vDC that are not directly accessible.

Users of other cloud providers will be familiar with using SSH Key-pairs to authenticate connections to cloud-based servers. VMWare's vCloud Director does not support this currently, hence the specifying of an SSH Password pulled in from the VCAIR_SSH_PASSWORD environment variable. I will explore how to setup SSH Key-pair Authentication in a later blog post.

To create a new VM, all you now need to add to your recipe now is a machine resource.

machine 'jumpbox01' do
  tag 'jumpbox'
  machine_options vcair_opts
end

Being a ruby script, we can make use of standard ruby functionality to implement iterative loops and conditional logic etc, so to create an arbitrary number of web servers, we can easily wrap a machine resource in a loop.

1.upto(num_webservers) do |i|
   machine "linuxweb#{i}" do
      tag 'webserver'
      machine_options vcair_opts.merge({ memory: '2048'})
   end
end

Chef resources are typically processed sequentially, however we can wrap our machine resource definitions up inside a machine_batch resource, and when the recipe is processed, all the machines will be created in parallel. If you take a look at the skyscapecloud-demo.rb recipe in the Automation Example repository, you will see that the jumpbox server is created first, then the database server and two web servers are all brought up in parallel.

Deploying My Simple Web App

For this demo, my web app is a simple one page PHP script that connects to a back-end database, increments a counter and then displays back to the user the current count. To deploy the PHP script, I have created a simple chef cookbook that will:

  • Install the Nginx and php-fpm packages and any pre-reqs.
  • Defines an Nginx site for the web app.
  • Deploys index.php and favicon.ico files.
  • Uses a chef search to locate the IP Address of the database server and generate a config.php file with the database credentials.
The cookbook is included in the Automation Examples github repository in the my_web_app_cookbook sub-directory. It makes use of a number of shared community cookbooks downloaded from the Chef Supermarket site to perform common tasks like installing and configuring Nginx.

To prepare the cookbook and its dependencies for deployment to our new VMs, we use the Berkshelf tool that is bundled in the Chef Developer Kit. To create a local cache of all the pre-req community cookbooks you first run:

> berks install
Resolving cookbook dependencies...
Fetching 'my_web_app' from source at ../my_web_app_cookbook
Fetching cookbook index from https://supermarket.chef.io...
Installing apt (2.9.2)
Installing bluepill (2.4.1)
Installing build-essential (2.2.4)
Installing chef-sugar (3.2.0)
Installing database (4.0.9)
Installing mariadb (0.3.1)
Using my_web_app (0.3.0) from source at ../my_web_app_cookbook
Installing mysql (6.1.2)
Installing mysql2_chef_gem (1.0.2)
Installing nginx (2.7.6)
Installing ohai (2.0.4)
Installing openssl (4.4.0)
Installing packagecloud (0.1.1)
Installing php-fpm (0.7.5)
Installing postgresql (3.4.24)
Installing rbac (1.0.3)
Installing rsyslog (2.2.0)
Installing runit (1.7.6)
Installing smf (2.2.7)
Installing yum (3.8.2)
Installing yum-epel (0.6.5)
Installing yum-mysql-community (0.1.21)

In order for the Chef Provisioning scripts to deploy our web app using the cookbook, we need to bundle up the my_web_app cookbook and all it's dependencies into a central cookbooks sub-directory. To do this, run:
> berks vendor cookbooks
Resolving cookbook dependencies...
Fetching 'my_web_app' from source at ../my_web_app_cookbook
Using apt (2.9.2)
Using bluepill (2.4.1)
Using build-essential (2.2.4)
Using chef-sugar (3.2.0)
Using database (4.0.9)
Using mariadb (0.3.1)
Using my_web_app (0.3.0) from source at ../my_web_app_cookbook
Using mysql (6.1.2)
Using mysql2_chef_gem (1.0.2)
Using nginx (2.7.6)
Using ohai (2.0.4)
Using openssl (4.4.0)
Using packagecloud (0.1.1)
Using php-fpm (0.7.5)
Using postgresql (3.4.24)
Using rbac (1.0.3)
Using rsyslog (2.2.0)
Using runit (1.7.6)
Using smf (2.2.7)
Using yum (3.8.2)
Using yum-epel (0.6.5)
Using yum-mysql-community (0.1.21)
Vendoring apt (2.9.2) to cookbooks/apt
Vendoring bluepill (2.4.1) to cookbooks/bluepill
Vendoring build-essential (2.2.4) to cookbooks/build-essential
Vendoring chef-sugar (3.2.0) to cookbooks/chef-sugar
Vendoring database (4.0.9) to cookbooks/database
Vendoring mariadb (0.3.1) to cookbooks/mariadb
Vendoring my_web_app (0.3.0) to cookbooks/my_web_app
Vendoring mysql (6.1.2) to cookbooks/mysql
Vendoring mysql2_chef_gem (1.0.2) to cookbooks/mysql2_chef_gem
Vendoring nginx (2.7.6) to cookbooks/nginx
Vendoring ohai (2.0.4) to cookbooks/ohai
Vendoring openssl (4.4.0) to cookbooks/openssl
Vendoring packagecloud (0.1.1) to cookbooks/packagecloud
Vendoring php-fpm (0.7.5) to cookbooks/php-fpm
Vendoring postgresql (3.4.24) to cookbooks/postgresql
Vendoring rbac (1.0.3) to cookbooks/rbac
Vendoring rsyslog (2.2.0) to cookbooks/rsyslog
Vendoring runit (1.7.6) to cookbooks/runit
Vendoring smf (2.2.7) to cookbooks/smf
Vendoring yum (3.8.2) to cookbooks/yum
Vendoring yum-epel (0.6.5) to cookbooks/yum-epel
Vendoring yum-mysql-community (0.1.21) to cookbooks/yum-mysql-community

Bringing It All Together

So, we now have a recipe defining our Jumpbox, Database server and two Web servers and a cookbook to deploy our web app. To get the show on the road, all we need to now is run chef-client in local-mode with the name of our recipe:

chef-client -z skyscapecloud-demo.rb

This will run chef-client in local-mode, reading it's recipes from the current directory instead of connecting to a Chef Server instance. It will

  • Connect to your Skyscape Cloud account and create four VMs by cloning the centos71 vApp Template.
  • Wait until each VM is contactable, using the jumpbox's public address as an ssh relay.
  • Upload to each VM a chef configuration file and ssh key.
  • Download on each VM the chef-client package.
  • Run a chef-client converge on each VM using the my_web_app cookbook to configure the new server.

At the end of the chef-client run, you should have 4 VMs running in your Skyscape Cloud account. You can check this by running:
$ knife vcair server list
vAPP       Name      IP         CPU  Memory  OS                       Owner          Status
jumpbox01  centos71  10.1.1.10  1    512     CentOS 4/5/6/7 (64-bit)  1234.5.678901  on
linuxdb01  centos71  10.1.1.13  2    4096    CentOS 4/5/6/7 (64-bit)  1234.5.678901  on
linuxweb1  centos71  10.1.1.12  1    2048    CentOS 4/5/6/7 (64-bit)  1234.5.678901  on
linuxweb2  centos71  10.1.1.11  1    2048    CentOS 4/5/6/7 (64-bit)  1234.5.678901  on

More Gotchas

So we have 4 VMs running. More accurately, we have 4 vApps running, each containing a single VM. Not necessarily a problem, but not the best use of VMWare's vApps. It might be better to be able to put all the Web servers into a Web Server vApp, or maybe align the vApps to machine_batch resource definitions in the chef recipe, allowing VMWare to power on/off all the VMs in the vApp in a single operation.

We can SSH to the Jumpbox server and subsequently connect to each of the database and web servers. Each server's hostname is set to the name I gave it in the chef recipe. Great. Now I go and check the VM list again in vCloud Director.
All the VMs have the same name !!! The only way to identify them is by their vApp name.

And we still haven't completed our Web Application setup - we now need to manually go back to the vShield Edge configuration to setup the load balancer across the two web servers.

Also, it appears that the Fog driver's implementation of the machine resource is not 100% idempotent. You'll notice in the screenshot above that there is a "WARN: Machine jumpbox01 (...) no longer exists. Recreating ..." - which is incorrect. The machine still exists and in the next operation when it tries to create the machine again, it returns the same machine ID that supposedly didn't previously exist. It's not causing an issue, other that wasting time trying to re-create a VM that still exists, and is possibly the result of permissions being too restrictive on Skyscape's implementation of vCloud Director, preventing a query-by-id lookup on the existing VMs.

Further more, digging into the chef-provisioning-fog driver code-base, it seems that the IP address allocation mode is currently hard-coded to use the vDC Network's static pool. It is not currently possible to assign static IP addresses to a VM, or to allow it to use DHCP. This adds a reliance on the VMWare Tools on the client running the vm customisation phase after the server has booted in order to configure the correct IP address. In writing this post, I created and destroyed these servers a great many times, and for reasons I have not yet got to the bottom of, I had quite a few occasions where a new VM was created, but the customisation phase never ran and the IP address that vCloud Director allocated to it was never configured, leaving the VM with no network configuration and only accessible via remote console.

In Summary

So, it worked - sort of. Let's check off my test criteria:


CriteriaPass / FailComments
Website hosted in a vDCvDC / Networks / vShield Edge cannot be managed and had to be configured manually before continuing.
Deploy 2 Webservers behind a load-balancerThe webservers were created and web app deployed but the load-balancer on the vShield Edge had to be configured manually.

It would be possible to deploy an additional VM running Haproxy instead of using the load-balancer feature of the vShield Edge, but that was not attempted in this evaluation.
Deploy a database serverThe database server was deployed correctly, and the db config was inserted into the webservers allowing them to connect.
Deploy a jumpbox serverThe server was deployed correctly, but the NAT rules on the vShield Edge had to be configured manually.

Overall then, a success. On larger scale deployments, having all the VMs with the same name in the vCloud Director UI is going to make management painful, and even though this is a 'cloud' deployment, it would be useful to be able to specify static IP addresses on certain VMs when provisioning them.

Chef Provisioning is a very powerful orchestration tool, and with it's pluggable driver back-end, there is the opportunity to replace the Fog driver with a specific vCloud Director driver that understands vOrgs, vDC Networks and vShield Edge configurations. In the early days of the Chef Provisioning product, AWS support was initially through the Fog driver as well, but it has subsequently been replaced with a specific AWS driver that now supports many more features than just the VMs and that is what is required here to improve support for vCloud Director.

Monday 21 December 2015

Creating vApp Templates for Automation

First thing I'll need in order to automate any infrastructure build-out is a vApp Template. These are the basic VM building blocks on vCloud Director, the equivalent to an AWS AMI or OpenStack Machine Image. While I could upload an install ISO image to vCloud and step through a manual OS installation, in the spirit of automation, I decided to make use of a tool called Packer to create my vApp Template.

Packer is an open source tool for creating identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs on every major operating system, and is highly performant, creating machine images for multiple platforms in parallel. Packer does not replace configuration management like Chef or Puppet. In fact, when building images, Packer is able to use tools like Chef or Puppet to install software onto the image.

Packer has out of the box support for VMware, utilising VMware Workstation or a remote ESXi server as a virtualisation platform, and has a post-processor service to upload the subsequent machine template to a VMware vCenter server. It does not however have built-in support for creating a vApp Template on vCloud Director.  Options for uploading vApp Templates are very limited, and usually come down to using VMware's ovftool CLI.

Some members of the Open Source community have written plugins extensions to Packer, wrapping up the ovftool functionality into another post-processor service, however for the purpose of this exercise I think it is acceptable to trigger the ovftool CLI as a follow on step after the Packer run and will show you the command line options required to upload the vApp Template.

So, to get started you will want to clone the Automation Examples +Skyscape Cloud Services  github repository to your local workstation. You will also need to download and install the following tools:

  • VMware Workstation - you can use it with a 30 day evaluation license.
  • Packer - download from https://packer.io/downloads.html
  • OvfTool - download from https://www.vmware.com/support/developer/ovf/
Make sure that Packer and OvfTool are installed in a directory that is in your path. 

TLDR;

If all you want to do is generate your new vApp Template, all you should need to do now is change to the packer sub-directory in the github clone, and run the command:

packer build centos71.json

Once packer completes, you will have an output-centos71-vmware-iso directory with a centos71.vmx file in it. You will need to remove the reference to "nat" networking (see the README.md file) and use ovftool to upload the vm template to vCloud Director.

Packer In More Detail

The packer tool works by completely automating the creation of a virtual machine, either from installation ISO media, or from and existing machine image, and then applying further customisation to the machine by using provisioning scripts or configuration management tools like Chef and Puppet. 

The configuration of the virtual machine is held in a JSON format file and supports multiple 'input' formats, and multiple 'output' formats, spawning processes in parallel to efficiently create consistent machine images across multiple virtualisation platforms. The JSON file lists one or more 'builder' configurations that define the combinations of input/output formats. For this post, I am only going to use the 'vmware-iso' builder to generate a VMware template from an ISO installation CD-Rom. If multiple builders are specified, they all run in parallel.

The start of the process (at least for Redhat-based Linux distributions) is an ISO install media and a 'kickstart' script that answers all the questions and Next, Next, Next button clicks you would perform from a manual installation. Packer also supports the creation of Windows servers, using an equivalent Autounattend.xml file for a hands-off installation from a Windows ISO install. Packer will even download the ISO file for you from a specified URL, confirming its validity with a file checksum.

Having downloaded the ISO file, Packer will create a new VM using VMware Workstation using the CPU count, memory size and disk allocation specified in the JSON file, and configure the VM to mount the ISO file as a CD-ROM to boot from when the VM is powered on.

Start of a Packer run

The kickstart script is shared with the VM via a simple HTTP server run by the packer process, and when the VM is powered on, it 'types' the necessary Linux kernel parameters to start the installation and retrieve the kickstart script from it's temporary HTTP server. Packer then waits in the background for the CentOS installation to complete and SSH to become available.

Once packer can create an SSH connection to the newly installed VM, it works through each 'provisioner' in the JSON file sequentially to further customise the VM. There are a large number of provisioner steps supported by Packer, ranging from uploading a file to the VM, to uploading a script file (or even a script written in-line in the JSON) to the VM and executing it.

For this post, I am using a single provisioner step that uploads a number of scripts, sets up some environment variables in the remote shell, and then executes the uploaded scripts in that shell.

Packer process completing successfully

After all the provisioner steps have completed, packer will shutdown the new VM, compress the .vmdk files that define the VM's disk, and if any optional 'post-processor' steps are defined it will run them. We are not using any post-processor steps since uploading to vCloud Director is not supported at this time.

Uploading the vApp Template

There is one manual step to be performed before the packer-generated VM template can be uploaded. When packer uses VMware Workstation locally, it uses a 'NAT' network type. The generated centos71.vmx file references the 'NAT' network, and if uploaded as-is, vCloud Director will be unable to create new VMs from the template as the 'NAT' network does not exist. Using a text editor, you need to change the referenced 'nat' network to 'none'.

In vCloud Director, the vApp Templates are held in one of your organisation's catalogs. I have created a catalog in my vCloud account called 'DevOps'. I want to upload my new VM template to the DevOps catalog, giving it the name 'centos71'. After changing to the output-centos71-vmware-iso subdirectory, run the following command to create your new vApp Template:

ovftool --vCloudTemplate --acceptAllEulas --overwrite centos71.vmx "vcloud://%VCAIR_USERNAME%@api.vcd.portal.skyscapecloud.com:443?org=%VCAIR_ORG%&vappTemplate=centos71&catalog=DevOps"

This command pulls your vCloud organisation and username from environment variables and will prompt you for your password. The vApp Template name and catalog name are specified as part of the vcloud:// url, and the --overwrite option allows the replacement of an existing vApp Template with the new one being uploaded.

And that is that. I now have a new vApp Template called 'centos71' all ready to be used with my evaluation of different provisioning tools. It has a minimal OS installation, as defined by the kickstart script, with specific customisations applied to it to ensure that the necessary VMware Tools are installed and ready to be triggered by the VM customisation process when the template is used to launch new VMs.

My next post in this series will start to make use of this template to automatically provision my simple web application.

Monday 14 December 2015

Evaluating vCloud Director API tools

In order to evaluate the different tools available for interacting with the +Skyscape Cloud Services / vCloud Director API, I am setting myself the following challenge - To fully automate the creation of a simple Web Application:

  • The website should be hosted inside a vDC on the Skyscape Cloud Services infrastructure.
  • It will have 2 web servers behind a load-balancer.
  • The web servers will be backed by a single database server.
  • The vDC should have a separate jump-box server accessible via SSH to provide remote access to other servers in the vDC.
Over the next series of posts I shall review a number of options for representing this simple Web Application using Infrastructure-as-Code tools. Any scripts generated as part of the evaluations will be shared on Github for you to download and try out for yourself.

These are some of the tools I shall be taking a look at. It is not exhaustive, so if there are others you think are worth reviewing, please add a comment.

Wednesday 9 December 2015

New Job, New Blog



Well, 4 weeks into the new job at +Skyscape Cloud Services  and I'm setting up my first blog in preparation for lots of goodness to come.

Keep an eye on this space for all sorts of updates about the DevOps related activities I get up to on the Skyscape Cloud platform.