Wednesday, 13 January 2016

vCloud Director and Chef Provisioning (Fog Driver)

Inspired by one of the presentations at Chef Conf 2015 showcasing the use of Chef Provisioning to provision servers on VMWare's vCloud Air service, I thought that I would start my evaluation of vCloud Director API tools by replicating the demonstration on +Skyscape Cloud Services infrastructure. I have a lot of experience using Chef Provisioning with the AWS driver, so how hard could it be...


The first thing I needed to do was download the Chef Development Kit (ChefDK) from https://downloads.chef.io/chef-dk/ - it can be installed on Windows, Mac or Linux platforms.

The Chef Development kit is a handy bundle of all the software components you need to develop and deploy chef cookbooks and recipes. It includes a bundled Ruby interpreter, the Chef Provisioning gem and the Fog driver for Chef Provisioning.

> chef --version
Chef Development Kit Version: 0.10.0
chef-client version: 12.5.1
berks version: 4.0.1
kitchen version: 1.4.2

> chef gem list chef-provisioning

*** LOCAL GEMS ***

chef-provisioning (1.5.0)
chef-provisioning-aws (1.6.1)
chef-provisioning-azure (0.4.0)
chef-provisioning-fog (0.15.0)
chef-provisioning-vagrant (0.10.0)

To get started you will want to clone the Automation Examples github repository to your local workstation. All the files used in this demo are in the chef-provisioning-fog sub-directory. From this point on, assume all commands are run from within the chef-provisioning-fog sub-directory. The Chef tools read their configuration from the knife.rb file in this directory. To keep the file fairly generic, it reads a number of user specific values from environment variable.

You will need to setup the following environment variables with values for your Skyscape Portal account - these can be found at https://portal.skyscapecloud.com/user/api :

VCAIR_ORG=1-2-33-456789
VCAIR_USERNAME=1234.5.67890
VCAIR_PASSWORD=Secret
JUMPBOX_IPADDRESS=XXX.XXX.XXX.XXX
VCAIR_SSH_USERNAME=vagrant
VCAIR_SSH_PASSWORD=vagrant

The last two variables are the credentials to use when logging into new VMs created from your vApp Template.

Optionally, you could also install the knife-vcair plugin, allowing you to use Chef's standard CLI tool to interact with vCloud Director. I will make use of some of the commands in this post, so to install it, run:
chef gem install knife-vcair

To confirm that your configuration and environment variables are correct, you should be able to run the following command:
knife vcair image list

It should return a long list of available vApp Templates from the Public Catalogs, and if you have followed my previous post Creating vApp Templates for Automation, you should also see in the list the centos71 template we uploaded.

The First Snag

The Chef Provisioning Fog driver has no facility to create or manage any components other than vApps / VMs. It cannot create vDC Networks, manage the Static IP Pool allocated to that network, or manage the vShield Edge properties to setup SNAT/DNAT rules, firewall rules or Load-Balancer configurations.

Given these restrictions, out-of-the-box we also cannot determine the Public IP address assigned to the Jumpbox Server, which is why we have to explicitly set it in the environment variables above.

Manual One-Time Setup

Before we can continue using Chef Provisioning to create our VMs, we need to go through a few set-up steps and configure the vDC environment so that it is ready to use.

First of all, we'll need to create a vDC Network for the VMs to use. We'll create a routed network for 10.1.1.0/24 with a Gateway address 10.1.1.1 and use a Primary DNS of 8.8.8.8. We'll add a Static IP pool for that network of 10.1.1.10-10.1.1.100.
The same information  can also be seen by running:

$ knife vcair network list
Name             Gateway   IP Range Start  End         Description
Jumpbox Network  10.1.1.1  10.1.1.10       10.1.1.100  Demo network for API automation examples

Since we know that the first VM created on the new network will be allocated the ip address 10.1.1.10, we can pre-create the necessary DNAT rule to allow inbound SSH access to the jumpbox. We'll also add SNAT rules to allow the created VMs to connect out to the internet as needed.
These rules are then complemented by associated firewall rules:

On With The Cooking

So with all the pre-reqs and setup done, let's get on and look at the Chef Provisioning recipe we are going to use to create our Web Application's infrastructure. Chef's recipes are Ruby scripts enhanced with Domain Specific Language definitions (DSL). The Chef DSL allows you to write in a declarative style, a list of resources to define your desired end state of your infrastructure.

Traditionally these resources would describe the setup of a single server, declaring what packages should be installed, templating the contents of configuration files, or defining local firewall rules and so on. Chef Provisioning extends the DSL to allow you to define servers themselves as a resource. All you need to include in your recipes to make use of these extensions is:

require 'chef/provisioning'

with_driver 'fog:Vcair'

Being a ruby script, we can define a number of variable in the script for simplicity and re-use, and we can pull in values from environment variables:
num_webservers = 2

vcair_opts = {
  bootstrap_options: {
    image_name: 'centos71',
    net: 'Jumpbox Network',
    memory: '512',
    cpus: '1',
    ssh_options: {
      password: ENV['VCAIR_SSH_PASSWORD'],
      user_known_hosts_file: '/dev/null',
    }
  },
  create_timeout: 600,
  start_timeout: 600,
  ssh_gateway: "#{ENV['VCAIR_SSH_USERNAME']}@#{ENV['JUMPBOX_IPADDRESS']}",
  ssh_options: { 
    :password => ENV['VCAIR_SSH_PASSWORD'],
  }
}

Here we are specifying our uploaded vApp Template name - centos71 - that we will use when creating new VMs, and the name of the vDC Network we created above. We also specify the use of an SSH Gateway, making use of the Jumpbox's Public IP Address as a relay for connecting to the other servers in the vDC that are not directly accessible.

Users of other cloud providers will be familiar with using SSH Key-pairs to authenticate connections to cloud-based servers. VMWare's vCloud Director does not support this currently, hence the specifying of an SSH Password pulled in from the VCAIR_SSH_PASSWORD environment variable. I will explore how to setup SSH Key-pair Authentication in a later blog post.

To create a new VM, all you now need to add to your recipe now is a machine resource.

machine 'jumpbox01' do
  tag 'jumpbox'
  machine_options vcair_opts
end

Being a ruby script, we can make use of standard ruby functionality to implement iterative loops and conditional logic etc, so to create an arbitrary number of web servers, we can easily wrap a machine resource in a loop.

1.upto(num_webservers) do |i|
   machine "linuxweb#{i}" do
      tag 'webserver'
      machine_options vcair_opts.merge({ memory: '2048'})
   end
end

Chef resources are typically processed sequentially, however we can wrap our machine resource definitions up inside a machine_batch resource, and when the recipe is processed, all the machines will be created in parallel. If you take a look at the skyscapecloud-demo.rb recipe in the Automation Example repository, you will see that the jumpbox server is created first, then the database server and two web servers are all brought up in parallel.

Deploying My Simple Web App

For this demo, my web app is a simple one page PHP script that connects to a back-end database, increments a counter and then displays back to the user the current count. To deploy the PHP script, I have created a simple chef cookbook that will:

  • Install the Nginx and php-fpm packages and any pre-reqs.
  • Defines an Nginx site for the web app.
  • Deploys index.php and favicon.ico files.
  • Uses a chef search to locate the IP Address of the database server and generate a config.php file with the database credentials.
The cookbook is included in the Automation Examples github repository in the my_web_app_cookbook sub-directory. It makes use of a number of shared community cookbooks downloaded from the Chef Supermarket site to perform common tasks like installing and configuring Nginx.

To prepare the cookbook and its dependencies for deployment to our new VMs, we use the Berkshelf tool that is bundled in the Chef Developer Kit. To create a local cache of all the pre-req community cookbooks you first run:

> berks install
Resolving cookbook dependencies...
Fetching 'my_web_app' from source at ../my_web_app_cookbook
Fetching cookbook index from https://supermarket.chef.io...
Installing apt (2.9.2)
Installing bluepill (2.4.1)
Installing build-essential (2.2.4)
Installing chef-sugar (3.2.0)
Installing database (4.0.9)
Installing mariadb (0.3.1)
Using my_web_app (0.3.0) from source at ../my_web_app_cookbook
Installing mysql (6.1.2)
Installing mysql2_chef_gem (1.0.2)
Installing nginx (2.7.6)
Installing ohai (2.0.4)
Installing openssl (4.4.0)
Installing packagecloud (0.1.1)
Installing php-fpm (0.7.5)
Installing postgresql (3.4.24)
Installing rbac (1.0.3)
Installing rsyslog (2.2.0)
Installing runit (1.7.6)
Installing smf (2.2.7)
Installing yum (3.8.2)
Installing yum-epel (0.6.5)
Installing yum-mysql-community (0.1.21)

In order for the Chef Provisioning scripts to deploy our web app using the cookbook, we need to bundle up the my_web_app cookbook and all it's dependencies into a central cookbooks sub-directory. To do this, run:
> berks vendor cookbooks
Resolving cookbook dependencies...
Fetching 'my_web_app' from source at ../my_web_app_cookbook
Using apt (2.9.2)
Using bluepill (2.4.1)
Using build-essential (2.2.4)
Using chef-sugar (3.2.0)
Using database (4.0.9)
Using mariadb (0.3.1)
Using my_web_app (0.3.0) from source at ../my_web_app_cookbook
Using mysql (6.1.2)
Using mysql2_chef_gem (1.0.2)
Using nginx (2.7.6)
Using ohai (2.0.4)
Using openssl (4.4.0)
Using packagecloud (0.1.1)
Using php-fpm (0.7.5)
Using postgresql (3.4.24)
Using rbac (1.0.3)
Using rsyslog (2.2.0)
Using runit (1.7.6)
Using smf (2.2.7)
Using yum (3.8.2)
Using yum-epel (0.6.5)
Using yum-mysql-community (0.1.21)
Vendoring apt (2.9.2) to cookbooks/apt
Vendoring bluepill (2.4.1) to cookbooks/bluepill
Vendoring build-essential (2.2.4) to cookbooks/build-essential
Vendoring chef-sugar (3.2.0) to cookbooks/chef-sugar
Vendoring database (4.0.9) to cookbooks/database
Vendoring mariadb (0.3.1) to cookbooks/mariadb
Vendoring my_web_app (0.3.0) to cookbooks/my_web_app
Vendoring mysql (6.1.2) to cookbooks/mysql
Vendoring mysql2_chef_gem (1.0.2) to cookbooks/mysql2_chef_gem
Vendoring nginx (2.7.6) to cookbooks/nginx
Vendoring ohai (2.0.4) to cookbooks/ohai
Vendoring openssl (4.4.0) to cookbooks/openssl
Vendoring packagecloud (0.1.1) to cookbooks/packagecloud
Vendoring php-fpm (0.7.5) to cookbooks/php-fpm
Vendoring postgresql (3.4.24) to cookbooks/postgresql
Vendoring rbac (1.0.3) to cookbooks/rbac
Vendoring rsyslog (2.2.0) to cookbooks/rsyslog
Vendoring runit (1.7.6) to cookbooks/runit
Vendoring smf (2.2.7) to cookbooks/smf
Vendoring yum (3.8.2) to cookbooks/yum
Vendoring yum-epel (0.6.5) to cookbooks/yum-epel
Vendoring yum-mysql-community (0.1.21) to cookbooks/yum-mysql-community

Bringing It All Together

So, we now have a recipe defining our Jumpbox, Database server and two Web servers and a cookbook to deploy our web app. To get the show on the road, all we need to now is run chef-client in local-mode with the name of our recipe:

chef-client -z skyscapecloud-demo.rb

This will run chef-client in local-mode, reading it's recipes from the current directory instead of connecting to a Chef Server instance. It will

  • Connect to your Skyscape Cloud account and create four VMs by cloning the centos71 vApp Template.
  • Wait until each VM is contactable, using the jumpbox's public address as an ssh relay.
  • Upload to each VM a chef configuration file and ssh key.
  • Download on each VM the chef-client package.
  • Run a chef-client converge on each VM using the my_web_app cookbook to configure the new server.

At the end of the chef-client run, you should have 4 VMs running in your Skyscape Cloud account. You can check this by running:
$ knife vcair server list
vAPP       Name      IP         CPU  Memory  OS                       Owner          Status
jumpbox01  centos71  10.1.1.10  1    512     CentOS 4/5/6/7 (64-bit)  1234.5.678901  on
linuxdb01  centos71  10.1.1.13  2    4096    CentOS 4/5/6/7 (64-bit)  1234.5.678901  on
linuxweb1  centos71  10.1.1.12  1    2048    CentOS 4/5/6/7 (64-bit)  1234.5.678901  on
linuxweb2  centos71  10.1.1.11  1    2048    CentOS 4/5/6/7 (64-bit)  1234.5.678901  on

More Gotchas

So we have 4 VMs running. More accurately, we have 4 vApps running, each containing a single VM. Not necessarily a problem, but not the best use of VMWare's vApps. It might be better to be able to put all the Web servers into a Web Server vApp, or maybe align the vApps to machine_batch resource definitions in the chef recipe, allowing VMWare to power on/off all the VMs in the vApp in a single operation.

We can SSH to the Jumpbox server and subsequently connect to each of the database and web servers. Each server's hostname is set to the name I gave it in the chef recipe. Great. Now I go and check the VM list again in vCloud Director.
All the VMs have the same name !!! The only way to identify them is by their vApp name.

And we still haven't completed our Web Application setup - we now need to manually go back to the vShield Edge configuration to setup the load balancer across the two web servers.

Also, it appears that the Fog driver's implementation of the machine resource is not 100% idempotent. You'll notice in the screenshot above that there is a "WARN: Machine jumpbox01 (...) no longer exists. Recreating ..." - which is incorrect. The machine still exists and in the next operation when it tries to create the machine again, it returns the same machine ID that supposedly didn't previously exist. It's not causing an issue, other that wasting time trying to re-create a VM that still exists, and is possibly the result of permissions being too restrictive on Skyscape's implementation of vCloud Director, preventing a query-by-id lookup on the existing VMs.

Further more, digging into the chef-provisioning-fog driver code-base, it seems that the IP address allocation mode is currently hard-coded to use the vDC Network's static pool. It is not currently possible to assign static IP addresses to a VM, or to allow it to use DHCP. This adds a reliance on the VMWare Tools on the client running the vm customisation phase after the server has booted in order to configure the correct IP address. In writing this post, I created and destroyed these servers a great many times, and for reasons I have not yet got to the bottom of, I had quite a few occasions where a new VM was created, but the customisation phase never ran and the IP address that vCloud Director allocated to it was never configured, leaving the VM with no network configuration and only accessible via remote console.

In Summary

So, it worked - sort of. Let's check off my test criteria:


CriteriaPass / FailComments
Website hosted in a vDCvDC / Networks / vShield Edge cannot be managed and had to be configured manually before continuing.
Deploy 2 Webservers behind a load-balancerThe webservers were created and web app deployed but the load-balancer on the vShield Edge had to be configured manually.

It would be possible to deploy an additional VM running Haproxy instead of using the load-balancer feature of the vShield Edge, but that was not attempted in this evaluation.
Deploy a database serverThe database server was deployed correctly, and the db config was inserted into the webservers allowing them to connect.
Deploy a jumpbox serverThe server was deployed correctly, but the NAT rules on the vShield Edge had to be configured manually.

Overall then, a success. On larger scale deployments, having all the VMs with the same name in the vCloud Director UI is going to make management painful, and even though this is a 'cloud' deployment, it would be useful to be able to specify static IP addresses on certain VMs when provisioning them.

Chef Provisioning is a very powerful orchestration tool, and with it's pluggable driver back-end, there is the opportunity to replace the Fog driver with a specific vCloud Director driver that understands vOrgs, vDC Networks and vShield Edge configurations. In the early days of the Chef Provisioning product, AWS support was initially through the Fog driver as well, but it has subsequently been replaced with a specific AWS driver that now supports many more features than just the VMs and that is what is required here to improve support for vCloud Director.