The first part discusses the rationals behind the idea that open standards, free and open source software (FOSS), and automated configuration management bring business agility and better integration results than most one-size-fits-all identity integration solutions when it comes to extending the reach of an organization's authentication and access control policies outside its administrative domain.
To illustrate that claim, the second part will show how to effectively achieve an integration between Drupal-the well-known open source content management system (CMS), and OpenAM-the market leading open source authentication, authorization, entitlement and federation product, and how the resulting identity integration can be dynamically delivered in Amazon EC2 using the Opscode Chef tools and platform.
The problem in a nutshell
One of the main issues with creating new services in the AWS cloud, like any other public cloud, is that IT may loose control over who has access to what resources because the usual authentication and access control schemes that are commonly enforced within an organization's data center security perimeter may not be applicable in a multi-tenant environment. But IT surely needs to enforce authentication, sign-on (SSO), and access controls across the entire organization's IT services, so that, only those users that are properly authorized by IT are granted access, regardless of whether the application is running in-house or in the cloud. While centralized access control policies are relatively simple to support behind the enterprise's firewall, it is more difficult to support "in the wild" because IT has different types of control (or no control at all) over those software-as-a-service (SaaS) applications that are provided. As a result, compliance with regulations that mandate accountability for data security, privacy and auditing becomes challenging to achieve. Another challenge with moving an organization's business applications to a public cloud stands from the fact that IT doesn't want to duplicate identity and access management (IAM) information all over the place, as it may compromise security and the consistency of the organization's identity management system. In general, IT needs to leverage the identity information where they have it, meaning from within the organization's data center. Additional issues may arise from prohibiting the elasticity benefits of a cloud-operated infrastructure through locking users and applications in a more static compute environment that would prevent features like auto-scaling to work properly.
The devil is always in the details
Identity-enabling a business application, whether it is running on-premise or in a public cloud, almost always requires a fair amount of customization to address the nitty gritty details of an identity integration. I think this is because most business applications, regardless of how they are delivered, are almost never deployed out of the box, as is. If we take Drupal for example, the application does not work out-of-the-box in a SAML-based identity federation without requiring some significant customization. Note that I could also have taken the example of WordPress, Joomla, or MediaWiki. The same customization needs would apply.
Why is that? Simply because many of those applications pre-date the standardization and more pervasive use of federated identity technologies such as SAML. Those applications were designed with a built-in user management support in mind that is tightly coupled with the application's core functions. For example, Drupal creates, at install time, a SQL user database that is built out of a proprietary schema. Drupal needs that database to compute and render customized contents when a user logs in. This unfortunate fact of a fractured identity management landscape is responsible for the proliferation of the so called identity silos that are "traditionally" addressed by expensive identity management software, which mitigate the mess through the deployment of synchronization connectors. With that understanding in mind, it is easy to understand why data security and access control issues rank among the top cloud computing's adoption concerns in customer surveys.
No one-size-fits-all solution for the identity integration problem
Some newly entering cloud security vendors promote an architectural approach that can address the identity and access management divide by extending the reach of an organization's authentication and access control schemes to a cloud through the use of a general-purpose HTTP traffic interposition artifact known as HTTP Access Proxy Gateway. The role of an HTTP Access Proxy Gateway is to enforce user authentication and access control at the HTTP protocol level, somewhere in the cloud infrastructure, to protect access to resources. I would qualify this architectural approach as a one-size-fits-all solution, which in my opinion can hardy address the identity integration challenge, for business applications like Drupal, because it doesn't address the problem deeply enough to be truly usable. As we have seen above, the identity integration challenge needs to be addressed at the application level, as opposed to the network level, so that the core functions of the application can be maintained. Another issue with the proxy approach is that it creates a performance bottleneck (i.e. every HTTP request must be intercepted and checked against a valid session), and a single point of failure, which altogether doesn't play well with high availability and auto-scaling objectives.
A plug-in approach based on open standards and open source software
The starting point of the reflection builds upon the fact that Drupal, for example, can be easily extended. In open source software, modularity is more a rule than an exception. Therefore, thanks to its modular design, Drupal can incorporate plug-ins (called modules) that can modify the behavior of a particular core function. For example, in Drupal 6.x, a module that implements the hook_user API can alter the login / logout logic so that instead of getting a user's identity (i.e. email, first name, last name, ... attributes) from the database, the module can execute an authentication redirect to a trusted Identity Provider (idP) party, which in the end of the browser-based redirections choreography, returns a SAML assertion in the HTTP request. The assertion that is digitally signed is deciphered using the idP's public key to ensure that the user was properly authenticated against the organization's IAM authority. The SAML assertion contains the attributes of the user's identity that can be mapped to create (or update) a Drupal user account accordingly (if that account doesn't exist yet), and derive which role the user belongs to, and so, allows Drupal to enforce any role-based access control policy that may be defined.
All this is possible with using world-class FOSS from the simpleSAMLPHP project, led by UNINETT, and the OpenAM project, led by ForgeRock and the OpenSSO community.
The simpleSAMLphp project provides a native PHP application that supports the functions of a SAML V2 Service Provider (SP). In addition, simpleSAMLphp provides also a Drupal 6.x module that implements the hook_user API that can redirect authentication requests to a SAML V2 compliant idP like OpenAM. OpenAM is the IAM Swiss Army Knife solution in that it can perform authentication protocol conversions between SAML and many other enterprise's authentication standards including, but not limited to, LDAP, Kerberos, X509 certificate, and Radius.
Automated delivery of the identity-enabled application to Amazon EC2
In a series of foundational articles "Cloud computing and the big rethink", James Urquhart argues that with cloud computing, the very form of application delivery will change in that cloud-operated infrastructures and software development will play a major role. This comes from the need for more efficient application delivery and operations to address the accelerated need for new software functionality driven by end users. The most obvious place where this is happening is in the SaaS area. Cloud services that fall under this category are targeted at end users with specific business needs such as content management system (CMS) and customer relationship management (CRM). I think that this concept is particularly relevant to our identity integration case because it helps addressing the challenge of delivering compliant services in the cloud. As such, the creation of identity-enabled applications calls for a high level of automated operations so that the communicating parties can be properly and securely linked with no (or minimal) manual intervention.
This is were the Opscode Chef framework comes into place. Chef allows to bootstrap a virtual machine instance in an infrastructure supported by the Opscode Platform to dynamically install and configure all the software components that are required to run the identity-enabled application. The concept behind Opscode Chef is known as DevOps. This concept contrasts with the static image building process in that software installation and configuration is performed at runtime by program. A software configuration can be tweaked incrementally until a machine's state matches the desired end state without having to re-bundle a new image every time a tiny change is applied. Then, configuration management becomes more like a software development project, and so, breaking the divide between IT operations and software development.
With Opscode Chef, a configuration management task is called a recipe. It is written in a domain specific language (DSL) based on Ruby. Recipes can execute arbitrary Ruby code, but are mainly designed to execute actions on resources that are an abstract representation of system resources like packages, directories and files. For example to install Apache2 on Linux, one would just have to insert this block in a recipe, or event better, execute the default recipe of the Apache2 cookbook.
package "apache2" do
Once the Apache2 package is installed on the target machine, further invocations of that recipe would not attempt to reinstall the package. This behavior is referred to as being idempotent meaning unchanged when multiplied against itself (in mathematics). Opscode Chef also uses the concept of node, which can be associated with one or several roles, which themselves encapsulate runtime configuration parameters and a list of recipes to execute on the target machine. Recipes are packaged in a cookbook whose successive versions are typically managed in a source code control system such as Git or Subversion.
Voila, that is all for today. After this rather lengthly, and hopefully not too boring introduction, the second part of this post will discuss the concrete details of how to do the integration and run the automated delivery of the identity-enabled Drupal.