jeudi 16 décembre 2010

Automated Delivery of an Identity-enabled Drupal in the AWS Cloud (Part 2)

This the second part of a two-part post that examines how to automatically create an identity-enabled business application in the AWS cloud.

The first part discussed some rationals about the idea that open standards, free and open source software (FOSS), and automated configuration management bring business agility and better integration results than most one-size-fits-all identity integration solutions when it comes to extending the reach of an organization's authentication and access control policies outside of its administrative domain.

The second part intends to show how to effectively achieve this kind of integration with Drupal - the well-known open source content management system (CMS), and OpenAM - the market leading open source authentication, authorization, entitlement and federation product, and how the resulting identity integration can be automatically delivered into Amazon EC2 using the Opscode Chef tools and platform.

Let's try to do some work now...

Install and Configure Chef

Your best bet to get Chef running on your workstation is to follow the Opscode's How To Get Started Tutorial. You only need to complete Step1 and Step2. Creating a Chef client (Step3), at this point, is not necessary as we will do it "automagically" by using the knife bootstrap command.

After completing Step2 you should have a Chef repository that you will need to manage your cookbooks.

Import the required cookbooks
You need to download all the cookbooks that are in the dependency chain of the simplesamlphp cookbook, which I created for this post. The simplesamlphp cookbook has a direct dependency on the drupal cookbook, which in turn has a direct dependency over several other cookbooks including php, apache2  cookbooks. In other words, all the cookbooks required to build a full all-in-one LAMP stack.

You can download all these cookbooks separately from the Opscode's Cookbooks Repository or use knife. The best practice in working with the Opscode Platform is to always keep local copies of the cookbooks you are using in your chef-repo.

To import a cookbook with knife:
$ cd ~/chef-repo
$ knife cookbook site vendor cookbook-name

This command downloads cookbook-name, and creates a 'vendor branch' for you, which enables you to make your own custom changes to the cookbook and keep track of the differences between yours and the upstream. You'll find that there is no simplesamlphp cookbook in the Opscode's Coobooks Repository. That's normal because I didn't put it there since this cookbook is just a proof of concept for the post. You need to download it individually as well as a slightly modified version of the cookbook for drupal from GitHub.

Modify simplesamlphp cookbook
If you intend to run the same setup against your own instance of OpenAM server, you'll need to modify the simplesamlphp cookbook in oder to add the SAML v2 metadata descriptor of your own identity provider. For that you need to edit the file saml20-idp-remote.php under files and change the value of the idpname and idpid attributes within the drupalsaml role accordingly.


Upload all the cookbooks and roles
After you have modified your cookbooks or not, you need to upload them in your organization that resides in the Opscode Platform:

$ cd ~/chef-repo
$ knife cookbook upload --all
$ knife role from file roles/drupalsaml

Bootstrap the Drupal Server Instance in EC2

Chef's knife bootstrap command allows to literally bootstrap a fully functional Drupal server in EC2 out of a vanilla linux distribution. In this demo I use an EBS-backed Ubuntu 10.04 LTS (Lucid Lynx) Server 32-bit distribution with AMI id 'ami-f4340180'.
Don't forget to download the AWS key pair, for the AWS region you want to use, somewhere in your home directory (ex. ~/.ssh/eu-west-1-keypair.pem).

Launch a new EC2 instance from the AWS Management Console. Once the instance is running, copy it's Pulic DNS URL obtained from the dashboard and past it into the knife bootstrap command as shown below. Note the parameter -r 'role[drupalsaml]' that tells Chef to configure the instance as per the recipes and attributes defined in the drupalsaml role. The bootstrap is displayed to the instance's console starting with the installation of Chef itself, followed by the automated installation and configuration of all the required components (i.e. Apache2, PHP,  OpenSSL, MySQL, Drupal, simpleSAMLphp, memcached, and sendmail).

# knife bootstrap ec2-46-51-163-241.eu-west-1.compute.amazonaws.com \
> -r 'role[drupalsaml]' -i ../.ssh/eu-west-keypair.pem -x ubuntu --sudoINFO: Bootstrapping Chef on 
0% [Working]3-241.eu-west-1.compute.amazonaws.com 
Get: 1 http://security.ubuntu.com lucid-security Release.gpg [198B]
Ign http://security.ubuntu.com/ubuntu/ lucid-security/main Translation-en_GB   
96% [Connecting to eu-west-1.ec2.archive.ubuntu.com (10.224.74.112)]
Ign http://security.ubuntu.com/ubuntu/ lucid-security/universe Translation-en_GB
Get: 2 http://security.ubuntu.com lucid-security Release [38.5kB]   
0% [Connecting to eu-west-1.ec2.archive.ubuntu.com (10.224.74.112)] [2 Release 
Hit http://eu-west-1.ec2.archive.ubuntu.com lucid Release.gpg                  

[.....]

ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed mixlib-authentication-1.1.4
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed mime-types-1.16
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed rest-client-1.6.1
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed bunny-0.6.0
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed abstract-1.0.0
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed erubis-2.6.6
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed moneta-0.6.0
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed highline-1.6.1
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed uuidtools-2.1.1
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com Successfully installed chef-0.9.12
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com 17 gems installed
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 12:55:20 +0000] INFO: Client key /etc/chef/client.pem is not present - registering
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 12:55:24 +0000] WARN: HTTP Request Returned 404 Not Found: Cannot load node ip-10-234-178-251.eu-west-1.compute.internal
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 12:55:27 +0000] INFO: Setting the run_list to ["role[drupalsaml]"] from JSON
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 12:55:29 +0000] INFO: Starting Chef Run (Version 0.9.12)

[.....]

ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 13:00:31 +0000] INFO: Navigate to 'http://ec2-46-51-163-241.eu-west-1.compute.amazonaws.com/install.php' to complete the drupal installation
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 13:00:43 +0000] INFO: Chef Run complete in 314.902713 seconds
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 13:00:43 +0000] INFO: cleaning the checksum cache
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 13:00:43 +0000] INFO: Running report handlers
ec2-46-51-163-241.eu-west-1.compute.amazonaws.com [Wed, 15 Dec 2010 13:00:43 +0000] INFO: Report handlers complete

A the end of the run, which in my case took about 5 minutes, you get a fully operational identity-enabled Drupal server instance federated with OpenAM. Note however, the message at the end of the configuration sequence:

Navigate to 'http://ec2-46-51-163-241.eu-west-1.compute.amazonaws.com/install.php' to complete the drupal installation.

This is because Drupal 6.x requires some manual configuration at the end of the installation. According to a conversation I had with Marius Ducea, who authored the cookbook for Drupal, that limitation should be removed with Drupal 7.

It is assumed that there is an OpenAM 9.5 server running somewhere in your data center or in the cloud. That OpenAM server must be accessible from the newly created Drupal instance. In this demo, I have created an OpenAM server in EC2 that is attached to an Elastic IP (static IP address). Chef is also used in the OpenAM server to help with automatically importing the metadata descriptor of the SAML2 service providers that are started in the same Opscode's managed organization. A unique recipe for OpenAM is executed on a regular basis, by chef-client running as a daemon, which only task is to search for any node that is assigned with the drupalsaml role, then retrieve the location of the service provider metadata reading the simplesamlphp.metadata attribute, to finally execute ssoadm import-entity locally. The openam recipe is fairly straighforward.

sps = []
url = String.new
hostname = String.new

search(:node, "role:drupalsaml") do |n|
  sps << n
end

sps.each do |n|
  url = n['simplesamlphp']['metadata']
  hostname = n['hostname']

  if !url.nil? && url.match(/^http/)
    bash "download-entity-descriptor" do
      user "root"
      cwd "/tmp"
      code <<-EOH
      wget #{url}
      EOH
      only_if "/usr/bin/test ! -f /var/log/entities#{hostname}"
    end

    execute "move_entity_descriptor" do
      user "root"
      command "mv /tmp/openam-idp /var/log/entities/#{hostname}"
      action :nothing
    end

    execute "ssoadm_import_entity" do
      user "root"
      command "/opt/openam/bin/ssoadm import-entity -u amadmin -f /opt/openam/password -m /tmp/openam-idp -t local"
      notifies :run, resources("execute[move_entity_descriptor]")
    end
  end
end  

Now, you click on the screencast below to see how the identity federation between Drupal and OpenAM works in practice.


video

vendredi 3 décembre 2010

Automated Delivery of an Identity-enabled Drupal in the AWS Cloud (Part I)

This post is composed of two parts that examine how to automatically create an identity-enabled business application to the AWS cloud.

The first part discusses the rationals behind the idea that open standards, free and open source software (FOSS), and automated configuration management bring business agility and better integration results than most one-size-fits-all identity integration solutions when it comes to extending the reach of an organization's authentication and access control policies outside its administrative domain.

To illustrate that claim, the second part will show how to effectively achieve an integration between Drupal-the well-known open source content management system (CMS), and OpenAM-the market leading open source authentication, authorization, entitlement and federation product, and how the resulting identity integration can be dynamically delivered in Amazon EC2 using the Opscode Chef tools and platform.

The problem in a nutshell

One of the main issues with creating new services in the AWS cloud, like any other public cloud, is that IT may loose control over who has access to what resources because the usual authentication and access control schemes that are commonly enforced within an organization's data center security perimeter may not be applicable in a multi-tenant environment. But IT surely needs to enforce authentication, sign-on (SSO), and access controls across the entire organization's IT services, so that, only those users that are properly authorized by IT are granted access, regardless of whether the application is running in-house or in the cloud. While centralized access control policies are relatively simple to support behind the enterprise's firewall, it is more difficult to support "in the wild" because IT has different types of control (or no control at all) over those software-as-a-service (SaaS) applications that are provided. As a result, compliance with regulations that mandate accountability for data security, privacy and auditing becomes challenging to achieve. Another challenge with moving an organization's business applications to a public cloud stands from the fact that IT doesn't want to duplicate identity and access management (IAM) information all over the place, as it may compromise security and the consistency of the organization's identity management system. In general, IT needs to leverage the identity information where they have it, meaning from within the organization's data center. Additional issues may arise from prohibiting the elasticity benefits of a cloud-operated infrastructure through locking users and applications in a more static compute environment that would prevent features like auto-scaling to work properly.

The devil is always in the details

Identity-enabling a business application, whether it is running on-premise or in a public cloud, almost always requires a fair amount of customization to address the nitty gritty details of an identity integration. I think this is because most business applications, regardless of how they are delivered, are almost never deployed out of the box, as is. If we take Drupal for example, the application does not work out-of-the-box in a SAML-based identity federation without requiring some significant customization. Note that I could also have taken the example of WordPress, Joomla, or MediaWiki. The same customization needs would apply.

Why is that? Simply because many of those applications pre-date the standardization and more pervasive use of federated identity technologies such as SAML. Those applications were designed with a built-in user management support in mind that is tightly coupled with the application's core functions. For example, Drupal creates, at install time, a SQL user database that is built out of a proprietary schema. Drupal needs that database to compute and render customized contents when a user logs in. This unfortunate fact of a fractured identity management landscape is responsible for the proliferation of the so called identity silos that are "traditionally" addressed by expensive identity management software, which mitigate the mess through the deployment of synchronization connectors. With that understanding in mind, it is easy to understand why data security and access control issues rank among the top cloud computing's adoption concerns in customer surveys.

No one-size-fits-all solution for the identity integration problem

Some newly entering cloud security vendors promote an architectural approach that can address the identity and access management divide by extending the reach of an organization's authentication and access control schemes to a cloud through the use of a general-purpose HTTP traffic interposition artifact known as HTTP Access Proxy Gateway. The role of an HTTP Access Proxy Gateway is to enforce user authentication and access control at the HTTP protocol level, somewhere in the cloud infrastructure, to protect access to resources. I would qualify this architectural approach as a one-size-fits-all solution, which in my opinion can hardy address the identity integration challenge, for business  applications like Drupal, because it doesn't address the problem deeply enough to be truly usable. As we have seen above, the identity integration challenge needs to be addressed at the application level, as opposed to the network level, so that the core functions of the application can be maintained. Another issue with the proxy approach is that it creates a performance bottleneck (i.e. every HTTP request must be intercepted and checked against a valid session), and a single point of failure, which altogether doesn't play well with high availability and auto-scaling objectives.

A plug-in approach based on open standards and open source software

The starting point of the reflection builds upon the fact that Drupal, for example, can be easily extended. In open source software, modularity is more a rule than an exception. Therefore, thanks to its modular design, Drupal can incorporate plug-ins (called modules) that can modify the behavior of a particular core function. For example, in Drupal 6.x, a module that implements the hook_user API can alter the login / logout logic so that instead of getting a user's identity (i.e. email, first name, last name, ... attributes) from the database, the module can execute an authentication redirect to a trusted Identity Provider (idP) party, which in the end of the browser-based redirections choreography, returns a SAML assertion in the HTTP request. The assertion that is digitally signed is deciphered using the idP's public key to ensure that the user was properly authenticated against the organization's IAM authority. The SAML assertion contains the attributes of the user's identity that can be mapped to create (or update) a Drupal user account accordingly (if that account doesn't exist yet), and derive which role the user belongs to, and so, allows Drupal to enforce any role-based access control policy that may be defined.

All this is possible with using world-class FOSS from the simpleSAMLPHP project, led by UNINETT, and the OpenAM project, led by ForgeRock and the OpenSSO community.

The simpleSAMLphp project provides a native PHP application that supports the functions of a SAML V2 Service Provider (SP). In addition, simpleSAMLphp provides also a Drupal 6.x module that implements the hook_user API that can redirect authentication requests to a SAML V2 compliant idP like OpenAM. OpenAM is the IAM Swiss Army Knife solution in that it can perform authentication protocol conversions between SAML and many other enterprise's authentication standards including, but not limited to, LDAP, Kerberos, X509 certificate, and Radius.

Automated delivery of the identity-enabled application to Amazon EC2

In a series of foundational articles "Cloud computing and the big rethink", James Urquhart argues that with cloud computing, the very form of application delivery will change in that cloud-operated infrastructures and software development will play a major role. This comes from the need for more efficient application delivery and operations to address the accelerated need for new software functionality driven by end users. The most obvious place where this is happening is in the SaaS area. Cloud services that fall under this category are targeted at end users with specific business needs such as content management system (CMS) and customer relationship management (CRM). I think that this concept is particularly relevant to our identity integration case because it helps addressing the challenge of delivering compliant services in the cloud. As such, the creation of identity-enabled applications calls for a high level of automated operations so that the communicating parties can be properly and securely linked with no (or minimal) manual intervention.

This is were the Opscode Chef framework comes into place. Chef allows to bootstrap a virtual machine instance in an infrastructure supported by the Opscode Platform to dynamically install and configure all the software components that are required to run the identity-enabled application. The concept behind Opscode Chef is known as DevOps. This concept contrasts with the static image building process in that software installation and configuration is performed at runtime by program. A software configuration can be tweaked incrementally until a machine's state matches the desired end state without having to re-bundle a new image every time a tiny change is applied. Then, configuration management becomes more like a software development project, and so, breaking the divide between IT operations and software development.

With Opscode Chef, a configuration management task is called a recipe. It is written in a domain specific language (DSL) based on Ruby. Recipes can execute arbitrary Ruby code, but are mainly designed to execute actions on resources that are an abstract representation of system resources like packages, directories and files. For example to install Apache2 on Linux, one would just have to insert this block in a recipe, or event better, execute the default recipe of the Apache2 cookbook.

package "apache2" do
    case node[:platform]
    when "centos","redhat","fedora","suse"
      package_name "httpd"
    when "debian","ubuntu"
      package_name "apache2"
   end
   action :install
end

Once the Apache2 package is installed on the target machine, further invocations of that recipe would not attempt to reinstall the package. This behavior is referred to as being idempotent meaning
unchanged when multiplied against itself (in mathematics). Opscode Chef also uses the concept of node, which can be associated with one or several roles, which themselves encapsulate runtime configuration parameters and a list of recipes to execute on the target machine. Recipes are packaged in a cookbook whose successive versions are typically managed in a source code control system such as Git or Subversion.

Voila, that is all for today. After this rather lengthly, and hopefully not too boring introduction, the second part of this post will discuss the concrete details of how to do the integration and run the automated delivery of the identity-enabled Drupal.