Monthly Archives: January 2011

Just Because the Cloud Offers Scalability, Doesn’t Mean That You Automatically Inherit It

In reading Vivek Kundra’s “25 Point Implementation Plan To Reform Federal Information”, I was struck by the anecdote regarding how the lack of scalability was the cause for outages and, ultimately, delays in processing transactions on the Car Allowance and Rebate System (CARS) or as it was more commonly known as Cash-for-Clunkers.  According to this document the overwhelming response overwhelmed the system leading to outages and service disruptions.  However, a multimedia company offering users the ability to create professional-quality TV-like videos and share them over the Internet scaled to meet rising demand that rose from 25,000 to 250,000  users in three days and reached a peak rate of 20,000 new users every hour.

The moral of the story is that the multimedia application was able to scale from 50 to 4,000 virtual machines as needed to meet demand because it was designed on a Cloud architecture.  While true, there is a very important piece of information lacking from this anecdote, which in turn could lead some to believe that the Cloud offers inherent scalability.  This piece of information is that the system you design must be able to take advantage of available opportunity to scale as much as have the facilities of the underlying platform support scaling.

In the comparison offered by Kundra, it’s clear that the system was appropriately designed to scale with the rapid growth in users.  For example, they may have had to add additional load balancers to distribute the load across increased numbers of web servers.  If they had a database architecture, perhaps the database was clustered and more nodes were added to the cluster to support the increased number of transactions.  If it was file-based, perhaps they were using a distributed file system, such as Hadoop and they were able to add new nodes in the system dispersed geographically to limit latency.  In each of these cases, it was the selection of the components and manner in which they were integrated that facilitated the ability to scale and not some inherent “magic” of the Cloud Computing platform.

It’s great that Kundra is putting forth a goal that the government needs to start seeking lower-cost alternatives to building data centers, but it’s also important to note that, according to this same document, many of today’s government IT application initatives are often behind schedule and fail to meet promised functionality.  It’s hard to believe that with these issues that the systems are going to be appropriately designed to run in a Cloud architecture and scale accordingly.  The key point here is that in addition to recommending a “Cloud First” policy, the government needs to hire contractors and employees that understand the nuances of developing an application that can benefit from Cloud capabilities.  In this way, the real benefit of Cloud will be realized and the Cloud First policy will achieve its goals.

Resiliency Is A Key Cloud Characteristic

Just finished reviewing the latest draft update to the National Institute of Standards and Technology (NIST) publication 800-145 “The NIST Definition of Cloud Computing”. This publication is rapidly becoming the accepted defintion within the US Federal government of Cloud Computing and, for the most part, it’s a rather decent treaty of the topic and capture of the key taxonomy. However, the listed essential characteristics of this document have always seemed lacking to me, but until today, I couldn’t quite put my finger on what I believed was missing—Resiliency. The NIST publication lists on-demand self-service, broad network access, resource pooling, rapid elasticity and measured services as the five key essential characteristics that comprise the Cloud Computing model. Indeed, in the paragraph that precedes this list, they even state, “This cloud model promotes availability…” Yet, how is availability promoted if one of the key essential characteristics is not resiliency?

Resiliency is the ability of the Cloud to provide availability in face of catastrophic failure of individual components and facilities. Without this one defining factor, I would argue, it would be inappropriate for any business or government to choose Cloud Computing as an alternative option. While it’s true that within providers, such as Amazon, Rackspace, Joyent, you can model Continuity-of-Operations (COOP) into your systems, these providers leave the design and deployment of this option as an exercise to the consumer, while I argue that it should be inherent in the service package.

This raises the question for me, is NIST’s defintion merely a lowest common denominator of the available service provider offerings on the market today rather than setting the expectation for what a consumer should expect from a Cloud provider? If so, that would be disappointing to learn that our government thought leaders are simply mouthpieces for the vendors. I believe NIST needs to set the bar high with it’s expectations of what defines Cloud Computing and resiliency, failover and COOP should all be key essential characteristics of that entity.

Indeed, I’ll go so far as to say that this should be the default state of the Cloud and offer options to consumers to mark things as volatile at a reduced cost versus the alternative approach that is assumed now that everything is volatile and persistent needs to be specifically designed into the deployed Cloud solution. After all, don’t we already expect this from Cloud providers, such as Google and Facebook? We don’t expect one day we will log on to one of these provider’s services and receive the message, “sorry, we crashed and all your data is gone,” so, why should we believe this is not a tenant worthy of defining the Cloud?

Living La Vida Cloud!

If you’ve been watching broadcast television anytime over the past few months you’ve most likely come across the Microsoft “To the Cloud” commercial.  Microsoft does their best to bring complex technologies to the mainstream, but sometimes … For most people, the Cloud is something they will never directly touch, but instead will be exposed to applications deployed in the Cloud with the key value being accessibility from multiple platforms, e.g. laptop, mobile phone, iPad, etc.  However, for a small percentage of us that are engaged in analyzing the requirements for and designing those environments that the multitudes will access their applications in the market is expanding as fast as the galaxy.

In the old days, bringing a product to market was an expensive proposition that required development, packaging and distribution, which was typically only undertaken once a year at a maximum.  However, today, a small group of engineers in a backroom at Google or Amazon can spend five months on a novel concept and the next thing you know there’s an entire new product (service???) that requires, at a minimum, an understanding of the new service’s features, and in the worst case, traversing enough of the learning curve to actually gain hands-on understanding of how this new service fits into the rest of the existing ecosystem of services.

On top of understanding the platform-as-a-service service offerings, the variants of Cloud infrastructure options are also emerging at an unbelievable rate.  Moreover, the options here are multiplied by the need to satisfy the public, private and hybrid models of Cloud Computing.  For example, the number of storage options alone keeps teams of industry analysts’ fingers close to the keyboard.

Clearly, for us, living in the Cloud has a whole different meaning than just accessing our data and applications over the Internet.   The Cloud is driving convergence at an extremely rapid pace.  The last major convergence of this ilk in IT was centered around telecommunications.  The emergence of technologies that facilitated voice, video and data being delivered over the same network connections changed the telecommunications industry in a major way and forced integration, brought about a need for retaining, fostered a need for infrastructure changes to support increased bandwidth requirements, and drove consolidation of vendors.

Cloud is now driving the next phase of convergence, which interestingly, is also being driven by advancement in networking capabilities.   The Cloud would certainly not be as attractive as it is if the speed of data over a network had not increased to current levels.  However, with 10Gb Ethernet moving quickly toward even faster options, the network is now capable of being the bus for the networked computer.  This means speeds that used to be reserved for direct attached cables to a motherboard can now be achieved over a network.

The faster the network, the more we can do and the faster we can innovate.  Some believe that advancements in virtualization have been the major driver of Cloud Computing.  However, without the ability to reach the Cloud and move data in and out of the Cloud at an effective rate to maintain a reasonable user experience, greatly diminishes the utility of virtual machines.  Additionally, once again, the advancement in networking speeds is what allows pooling of resources under a virtual umbrella to make virtualization as useful as it has become.

Ultimately, the question comes down to what’s the value?  To this, we have seen many answers; cost savings, energy efficiency, elasticity (ability to scale up and down based on demand), redundancy/enhanced availability, speed to market, and many others.  All of these are accurate answers.  So, for each person and each business, the reason to chase Cloud Computing is different.  Moreover, it could be multiple of these factors.

For me, I liken it to the power of the Cloud to a familiar old meme from the original Star Trek, “transferring power from engines to life support”.  When I was kid watching these Star Trek episodes, it was understood that the Enterprise could shift power to where it was most needed to support the mission.   With Cloud Computing, we are starting to see the realization of this concept. Using the same infrastructure, we can shift compute cycles to big data analysis, desktop emulation, database management, etc.  Moreover, if we are lucky enough to find more dilithium crystals and we don’t allow the matter and anti-matter to mix, then we can increase our compute cycles and do all these things at once! 😉