Author Archives: jpmorgenthal

The Two Faces of Microservices

Microservices (μServices) are a fascinating evolution of the Distributed Object Computing (DOC) paradigm. Initial design of DOC attempted to solve the problem of simplifying developing complex distributed applications by applying object-oriented design principles to disparate components operating across networked infrastructure. In this model, DOC “hid” the complexity of making this work from the developer regardless of the deployment architecture through the use of complex frameworks, such as Common Object Request Broker Architecture (CORBA) and Distributed Component Object Model (DCOM).

Eventually, these approaches waned in popularity as the distribution frameworks were clumsy and the separation of responsibilities between developer and operations did not meet with the promised goals. That is, developers still needed to understand too much about how the entire application behaved in a distributed mode to troubleshoot application problems and the implementation was too developer-centric to allow operations to be able to fulfill this role.

One aspect of this early architecture that did succeed, however, was the concept of the Remote Procedure Call (RPC). The RPC represented a way to call functionality inside of another applications across a network using the programming language function call constructs such as passing parameters and receiving a result. With the emergence of declarative syntaxes, such as XML, and then JSON, marshalling—the packaging of the data to and from the RPC—became simpler and the need for specialized brokers were replaced with generic transports, such as HTTP and asynchronous messaging. This gave rise to the era of Web Services and Service Oriented Architecture (SOA).

To make a long story short, Web Services were extremely popular, SOA, required too much investment in software infrastructure to be realized on a massive scale. Web Services was eventually rebranded Application Programming Interface (API)—there is really no difference architecturally between a Web Service and an API—and JSON became the primary marshalling scheme for Web-based APIs.

Apologies for the long-winded history lesson, but it is important to understand μServices in context. As you can see from this history the more we moved away from the principles of object-oriented toward a more straightforward client/server paradigm there was a rise in adoption. The primary reason for this is that architecture takes time, satisfies needs of longer-term goals, and requires skilled individuals that can often be expensive. With the growing need for immediacy driven by the expanding digital universe, these are characteristics that many business leaders believed were luxuries where speed was essential.

Needless to say, there was an immediate benefit of rapid growth of new business capabilities and insight into petabytes of data that was previously untouchable. Version 1.0 was a smashing success. Then came the need for 2.0. Uh-oh! In the race to get something fast, what was ignored was sustainability of the software. Inherent technical debt fast became the inhibitor to deriving 2.0 enhancements at the same speed as 1.0 was developed. For example, instead of envisioning that three applications all implemented similar logic and developing that once as a configurable component, it was developed three times each specific to a single application.

Having realized the value of architecture and object-oriented design that was dropped in favor of speed, the vacuum created was for a way to use the speedy implementation mechanics while still being able to take advantage of object-oriented design paradigm. The answer is μServices.

While Martin Fowler and others have done a great job explaining the “what” and “how” of μServices, for me the big realization was in the “why” (described herein). Without the “why” it’s too easy to get entangled in the differentiation between this and the aforementioned Web Services. For me, the “why” provided ample guidelines for describing the difference between a μServices, an API and a SOA service.

For simplicity I’ll review the tenets of OO here and describe their applicability to μServices:

  • Information hiding – the internal representation of data is not exposed externally, only through behaviors on the object
  • Polymorphic – a consumer can treat a subtype of an object identically to the parent. In this particular case, μServices that implements a particular interface can be consumed in the identical fashion
  • Inheritance – the ability for one object to inherit from another and override one or more behaviors. In the case of μServices, we can create a new service that delegates some or all behavior to another service.

The interesting thing about these tenets as a basis for μServices, and subsequently the basis for the title of this article, is that answering the mail on this does not necessitate complete redevelopment. Indeed, in many cases, existing functionality can be refactored from the 1.0 software and packaged up using container technology delivering the exact same benefit as having developed the 2.0 version from scratch as a μService.

Let’s revisit our earlier dilemma that similar logic was developed three different times into three different applications. For purposes of this blog, let’s assume that is a tax calculation and was written once each for US, Canada and Europe. Each of these has a table implemented in a database. It would not take much work to take these different implementations, put them into a single  μService with a single REST interface using a GET operation with the region and providing the necessary inputs on the query string. That new μService could then be packaged up inside a Docker container with its own Nginx (Web Server) and MySQL database with required tax tables for each region. In fact this entire process could probably be accomplished, tested and deployed in the span of a week. Now, we can create four new applications that all leverage the same tax calculation logic without writing it four more times.

This works great as long as the tax tables don’t change or we don’t want to add a new region. In that case, additional development would be required and the container would need to be re-created, tested and re-deployed.

Alternatively, we could develop a reusable tax service and deploy this new μService in a Platform-as-a-Service (PaaS). Assumedly, we could extend this service with new regions and changes to tax tables without impacting any other region, having to regression test the entire tax service, or take the μService out of service during the redeployment period. Moreover, the new region would be available simply by modifying the routing rules for the REST URL to accept the new region.

The diagram below illustrates these two different options. The .WAR file represents the deployable tax calculator. As you can see one or more containers would need to be either patched or re-created to deploy new functionality in the Deployment Architecture model, whereas we could continue to deploy multiple .WAR files in the PaaS Architecture, which would handle routing off the same URL-based interface giving appearance of being a single application.

 

Thus, the two faces of μServices are those create through deployment and those created through design and development. As a lifelong software architect, I recognize the pragmatism in getting to market faster using the deployment architecture, but highly-recommend redesign and development for greater sustainability and longevity.

If you found this article useful, please leave a comment.

Time for the Tail to Stop Wagging the Dog

Here’s a novel, but controversial statement, “it’s time for the CEO, COO, CIO to start to take joint responsibility for application platform decisions.” For too many years now technical meritocracy has led the decision-making for the business with regard to platform selection. This includes, but is not limited to, servers, operating systems, virtualization, cloud and application platforms. In many of these cases the decision has not worked in favor of the business with regard to agility and costs.

I see it now with clients. Senior technical leadership recommending approaches that are questionable with regard to longevity, maintainability, sustainability and growth of the business. While its true that we do not have crystal balls and cannot tell what might happen, it seems there is a leaning away from what has been termed “opinionated” platforms. These are platform that come with some restrictions with regard to implementation, but do so in favor of speed and simplicity. These decisions are increasing the amount of technical debt the organization is accruing and will one day require the business to pay to undo.

Of note, and to be fair, not all these decisions are based on engineering bias. Some of these decisions relate directly back to business concerns of vendor lock-in. This concern over vendor lock-in, however, is being liberally applied to emerging platforms that are rooted in open sourced development and strong support for programming interfaces; things missing from the products of decades ago that locked in your data and offered no means to easily migrate or exit from use of the platform. This concern is mitigated even with today’s opinionated approaches. While a wholesale shift to a new platform may require some rework, the cost and effort does not warrant the choice to select a more complex, do-it-yourself platform.

While at EMC, one of my responsibilities was to help sales and our customers understand the value proposition of developing internal cloud capabilities using VCE’s Vblock technology. One of the key value propositions, and one that I still agree with now, was that it removed the need for data center engineers to spend time developing infrastructure out of discrete components. Vblock is in the family of converged architectures, which means that it has contained within it, network, compute, memory and storage pre-integrated. Vblock was designed to be updated remotely and the updates were fully-tested against against the known components prior to being pushed out. Hence, there was a very high success rate for the update and no outages due to conflicts between the various components due to changes.

The net result of moving to Vblock was less time spent on engineering solutions to work together when using discrete components, fewer outages due to conflicts when one component is updated, and a lower total cost to operate.

Today, businesses can select from Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) solutions that offer the same value proposition as the Vblock. By selecting certain IaaS or PaaS approaches business can save hundreds of hours and significantly simplify their computing environments in areas related to selection, configuration, and troubleshooting of internally integrated discrete components. This is the right choice for the business.

In cases where senior business-oriented executives have been party to selection we have seen them select the pre-engineered, opinionated approach due to the demonstrated ability to provide simplified operations, speed in time-to-market, and flexibility in deployment platforms. Examples of this include GE and Ford, which both selected Pivotal Cloud Foundry as a basis for delivering next-generation systems upon which their business will rely. A key reason for these decisions is that the platform takes an opinionated approach. The following is an excerpt of a blog that defines the differences well:

The Cloud Foundry community often proudly proclaims a key part of its current success and future lies in the fact that it is an opinionated platform. But what is an “opinionated” platform? The existential questions of how to deploy to the platform and run apps on the platform are answered in a specific, if not rigid, way. To boost productivity along a prescribed path, fewer options are exposed to developers and operators.

 Less opinionated platforms optimize flexibility in the guts of the system, offering more freedom for “do-it-yourself” implementations that feature custom components. More opinionated platforms provide structure and abstraction—“the platform does it for you.” Users are then free to focus on writing apps on top of said platform. And free to rapidly iterate on their apps, and reducing time-to-value for new features.

 Cloud Foundry is one such highly opinionated platform. Those opinions are formed and codified with the belief that platform decisions should help customers deliver new, high-quality software to production as fast as possible.

Now, some will say that being opinionated is an aspect of vendor lock-in. However, Pivotal is just one implementation of now three on the market. IBM and SUSE are other vendors that offer Cloud Foundry implementations based on the open source code base and are part of an organization responsible for ensuring openness of the platform. Moreover, Cloud Foundry can be acquired as-a-Service or implemented on self-managed infrastructure.

Back to my opening salvo. Software is eating the world and technology is touching every aspect of your business. Now, more than ever, it’s critical for C-level executives to fully understand the nature of platform decisions and to make sure that the decisions made are best for the longevity of the business and support the speed and agility they need to deploy new business capabilities and not progress the engineering prowess of its workers.

Technology of the Year for 2016 – The Web Platform

We’re starting to see predictions about what’s going to be hot in enterprise technology in 2017.  Cloud (yet again), Blockchain, Big Data/Analytics, Internet of Things (IoT) are all among the top of the list. However, it was Krish Subrumanian’s humorous tweet that started me thinking about what will be the single most important contribution from the enterprise technology community in 2016. While all the aforementioned technologies will certainly see a lot of interest and growth, they’re all enabled by a Web platform.

A platform is an architectural pattern and design that underlies all these capabilities. Its attributes are the ability to deliver a focused service and facilitate interaction through programmatic interfaces. The cloud platform delivers metered software and infrastructure services. The Blockchain platform is formulated by a set of loosely-coupled nodes that journal transactions all in an identical fashion to create an immutable and non-repudiated record. Big Data platforms deliver the ability to process vast quantities of data very quickly with the ability to scale out to increase capacity and speed.

Without the platform as an architectural pattern, many of the technologies that will see growth in 2017 really would not exist. Indeed, I would go so far to say that it the Web platform that truly has enabled the appearance and power of these other platforms. After all, we had multiple distributing computing platforms prior to the Web; Distributed Computing Environment (DCE), CORBA, DCOM, and Java RMI to name a few. However, none of these was ever so widely-adopted that they spawned equally powerful and widely-adopted platforms.

So, for me 2016, is the year of the Web platform. With HTTP/2, HTML5, ES6 and CSS2 it is the foundation for API economies, real-time data exchange, platform-independent computing and broad-spectrum dissemination of the information throughout the globe. Without these four key components so much of what we have seen develop over the past five years would most likely never have emerged.

So, in the vein of Time magazine giving David Bowie his due by declaring him Man of the Year in spite of his passing away, I think it’s time to recognize the technology that is fostering massive disruption in century-old industries, giving rise to new ways to connect and communicate, and providing the foundation for the next generation of applications. I declare Technology of the Year for 2016 to be The Web Platform!

The Curious Case of the WordPress Docker Container and the Devious XML-RPC Denial of Service Attack

Republished from 04/29 as it was lost due to a Docker Container crash… Irony!

I have an article in the recently released “DZone Guide to Building and Deploying Applications on the Cloud” entitled “Fullstack Engineering in the Age of Hybrid Cloud”. In this article I discuss the need and skills of a Fullstack Engineer with relation to troubleshooting and repairing complex, distributed hybrid cloud applications. My recent experiences with troubleshooting issues with my Docker WordPress container only reinforce the details I wrote about in this piece. Without my comprehensive understanding of both the infrastructure and application layer I don’t believe I could have achieved resolution (if I have, but more on that later).

1969479-dz-cloud2016cover-lgMy Docker WordPress container has always had issues with the “Error Connecting to Database” issue, but initially it would happen once a month and I would just re-start the container. I had read that the issue was fixed by moving to WordPress 4.5, so I upgraded, which came with its own challenges given these containers are supposed to be immutable.

Unfortunately, I designed my container when Docker architecture was in its infancy and so separating out and linking a MySQL container and the WordPress container as well as storing data on a separate volume are all features which emerged, or became more easily used, in later versions. Eventually, I will need redesign around 1.11 features, but for now, I’m just trying to keep up what I currently have. I did try just moving the database files onto permanent storage mapped in to the container as a volume, but all I did was fight with file permissions for a day and MySQL never ended up starting.

Recently, it became more and more difficult to keep the container up, so I upgraded to the latest Ubuntu 14.04 kernel and when that didn’t seem to help the issue I upgraded Docker from 1.4 to 1.11. None of these seemed to correct the issue. However, Docker 1.11 leverages the new architecture and uses cgroups, which resulted in cgroup out of memory thread killer posting messages to my console.

Screen Shot 2016-04-29 at 6.13.20 AM

Now, I could see that mysqld was being terminated at some point due to insufficient memory. To solve the memory issue, I tried optimizing the WordPress LAMP stack for low memory and even migrated from a 1G virtual machine to a 2G instance. It seems no matter how much memory I threw at this problem the longest the WordPress site would be active before the database connection issue appeared was an hour.

Totally baffled at this point, I started chasing down a lead regarding WordPress issues occurring on my cloud service provider. It seemed the issue I was seeing was happening to many others on Digital Ocean, perhaps this was a VPS (DO’s Droplet architecture is VPS-based) issue and not a Docker issue. DO responded on its forum to the various postings stating that running out of memory is common result of the known XML-RPC Denial of Service attack.  XML-RPC is the API interface for WordPress.

Wait! What am I doing? No one’s going to bother attacking my little old blog, it can’t be that. Back to optimizing memory use. Oh crud, this is still not getting me anywhere after two weeks.

Unfortunately, again my immutable container architecture limited my ability to see logs and SSH connections were often terminated due to low memory as well. Once I terminated the container without committing the container the logs were lost. So, I had to modify the current container to use an external volume for all the log files and now wrote them out to permanent storage.

Whoa! What do I find in the apache2 access.log after the next time the issue occurs? Well, when I did a tail of the last 200 entries I found my site was being attacked by a Googlebot, and there were a lot more entries in addition to those. In the end, I was a victim of a denial of service attack.

I believe its important to look at what data I had available and the characteristics identified by the logs and error messages. Nothing screamed DoS attack consuming mass number of threads on the Apache server and driving memory usage to 0 so that the memory manager was sacrificing threads to keep the OS alive (does that make anyone else think of Kirk screaming to Scotty, “all power to life support”?). When the attack stopped, mysqld_safe restored the thread, but it seems the socket or some other interprocess mechanism didn’t allow WordPress to communicate with the MySQL.

Piecing this together after the fact required a mix of skills. It might have been easier if I was doing live monitoring and tracking inbound requests while also constantly checking that WordPress could communicate MySQL, but realistically, this is a dramatic step when all else has failed.

Through this I learned a lot about container architecture, but this issue is probably still lingering. I’m just denying all requests to access XML-RPC from outside IP addresses at this time and the WordPress has been up for over 24 hours. More importantly, it really reinforces what I wrote about in the article and I don’t believe I could have reached this point if I didn’t have a good understanding of the infrastructure, operating system, networking, Docker and LAMP stack

The Anonymous Neighbor Problem in IT

COMED, my power company, sends out a monthly report that shows me my energy consumption relative to my neighbors. Every month I’m considerably higher than all my neighbors. This report also has a list of things I could do to reduce my energy consumption. The problem with this report is that it doesn’t take into account my house size relative to the other houses in my neighborhood. My house is the largest model and accounts for about one-third the local sample size. If I were to attempt to reduce my energy consumption so that I would be in line with the monthly average of my neighbors, I would probably have to adopt a lifestyle akin to the Amish. Hence, I call this the Anonymous Neighbor Problem and, I believe, is responsible for driving decisions that are thrust upon IT leadership from executive business managers.

Executive business managers receive reports from analyst and management consulting firms that compare their business expenditures on IT to other “similar” firms. The businesses surveyed in this report are in a similar industry and roughly the same size. These are the anonymous neighbors. The thing this report does not, and cannot, take into account is the history your particular business. Perhaps your business is older than all your neighbors, hence, it’s a much more complex and expensive process to migrate to lower cost operational alternatives. Perhaps these same executives now asking why it costs so much more than your neighbors to operate IT decided that they didn’t want to approve the necessary capital expenditures five years ago that would have led to a more efficient IT operational footprint. Finally, there is most likely some waste as every IT organization has some inherent waste.

Unfortunately, for many CIOs, when these reports are released all heads turn to them and ask them to get in line with their anonymous neighbors. I point back to the drastic measures that would be required of me to meet cutting energy goals to be in line with the average of my neighbors, but for many CIOs they cannot ignore the mandate. The results have been devastating for many IT shops. Loss of institutional knowledge, deep cuts in labor resulting in inability to keep up with necessary maintenance cycles, and use of outsourced vendors that are not properly managed requiring additional policing are all examples of what this Anonymous Neighbor Problem has led to in IT. Moreover, it leads to lower quality of service levels from IT fostering a cycle of questions as to costs versus service provided.

Having been on the side of sales and consulting to IT for many years, I am not a fan of the, “my needs are unique” or “you don’t understand our business” answers. I’ve written before about the cost of uniqueness in IT and I do believe uniqueness can be designed out in favor of commodity business capabilities, which will can significantly lower IT operating costs. However, this does not invalidate that the current state IT environment for a given business many not be in position to achieve the goal of meeting the average IT spend when compared to its anonymous neighbors without drastic measures that impair the value that IT can deliver to the business. Achieving this goal will require IT transformation, which may incur investment before the business can expect to see reductions in IT operational costs.

Hard Choices Are Required When Adopting a Cloud Operating Model

When was the last time you’ve ever heard anyone say “IT Applications & Operations”? Frankly, in my 30+ year career in IT, I don’t believe I’ve ever heard anyone use this term. The typical term we hear is IT Infrastructure & Operations. These two go together like Peanut Butter and Jelly, which tells us a lot about how we view the field of IT.

For those that may not be familiar with the role of IT Operations, Joe Hertvik does a great job here of describing IT Operations Management as someone engaged in the role of providing this service to the business. As you can see it’s very interesting how he specifically addresses the gap between responsibilities regarding IT Applications and IT Operations as a Venn diagram in which there is no overlap.

However, as we progress from a pre-Cloud Operating Model world to a post-Cloud Operating Model world, this coupling is changing. As we migrate workloads to the cloud, the first shift will be to operating infrastructure to operating Infrastructure-as-a-Service. This shift will leverage capacity management and monitoring skills used with virtualized environments.

COMShift

 

The second shift will come as migrated workloads are refactored into cloud native applications or new cloud native applications are developed specifically for cloud. Here the emphasis of operations will focus on the management and monitoring of the application platform. This may be done using a Platform-as-a-Service offering, such as Azure PaaS, AWS, Cloud Foundry or OpenShift, Software-as-a-Service platforms, such as Salesforce.com or ServiceNow, or even something more traditional based on traditional application servers.

During this second shift, operations will need to focus less on availability and more on consumption patterns. Given the nature of cloud native applications to support greater availability and resilience (see Pets & Cattle presentation by Randy Bias ) through design, emphasis will need to shift toward how services are being consumed in order to determine efficiency and costs management. Greater integration between the operations management platforms and the cloud services will be a critical requirement for this shift to occur.

The final shift will happen as businesses move toward Serverless Computing or Function-as-a-Service. In this model, business logic will execute in response to an event occurring in another service. Due to the temporal nature of this model, operations management and monitoring will change drastically. The application will only be available to monitor for brief periods requiring new techniques for operations in support of the Serverless Computing model. Failures that occur here may only be recognizable by performing analytics on post-execution logs.

The Impact of the Cloud Operating Model Shift

Having presented a view of the world post-Cloud Operating Model, you may ask what’s the impact of this roadmap on today’s traditional IT environment?

As operations is designed today they focus on running the physical and logical environments, which means a significant number of resources are focused on running the devices that run the network, compute and storage in addition to specialized software for monitoring and managing the various components. It also means that the overall budget has had to be divided across managing the physical environment and the applications, with the physical environment usually taking the lion’s share of the budget. This is just the nature of the beast as production applications tend to demonstrate a lower mean-time-to-failure and require less attention than its physical counterparts. Moreover, the physical environment is constrained by procurement cycles and capital expenditure approvals that is not characteristic of the applications.

As shown in the diagram above, as businesses move to away from self-managed infrastructure—it’s not just to cloud, but all infrastructure provided as-a-Service—the focus of IT operations will shift more of its focus to the workloads and the applications. Simultaneously, operational focus shifts away from running a physical environment as this task falls to the infrastructure providers. Hence, what we should expect to see going forward is operations being “unbound” from infrastructure and bonded with applications. As more and more businesses realize the economic benefits of relinquishing ownership of their infrastructure to a provider, IT should reorganize around operations of the applications and workloads.

This unbinding and re-binding process for operations is a key element for successfully implementing a cloud operating model. The results of this activity is that the remaining self-managed infrastructure organization needs to become more encapsulated. It will need fewer resources and should consider combining operational support with infrastructure engineering. This should result in lower operational overhead and significant reduction in IT costs.

This change will also raise every red flag you can imagine for those that have been in infrastructure and operations for the better part of their careers. The fiefdom holders, the server huggers, the CCIEs, vExperts, the tinkerers, the storage priests, etc. will demonstrate strong resistance to this change.  They will introduce fear, uncertainty and doubt whenever possible. They will regale stories of cloud failures and breaches. All of this is an attempt to maintain the status quo for as long as possible in light of this disruptive force. It is here that executives must weigh the pros and cons of this change and be the driving force behind moving to this new IT organizational construct.

Additionally, what we’ve seen in many early adopters of cloud is the applications move without its operations counterpart. Eventually, someone asks the question, “where’s my dashboard?” or “how is this integrated into our ITIL processes?” Thus, as we unbind operations from infrastructure and bind it with applications, what is being monitored and how its managed changes, but we carry across, and, hopefully, correct high-latency low-value, processes to support audit, transparency, and corrective action. Some existing operations skills will transfer to this new focus, however, there will be a change in the tooling used for these tasks and, likewise, different skills will be required to operate cloud-based and hybrid applications.

The biggest issue for business will be how to adopt and implement operational management as their workloads shift away from infrastructure and toward applications. There is some early guidance from the application performance management vendors, such as Dynatrace and AppDynamics, and examples from Webscale startups, but as whole, this segment of the industry is unwritten.

We know that key performance indicators that are important today will be different in this post-Cloud Operating Model world. Also, it is very likely that the tools needed to monitor and manage applications in this post-Cloud Operating Model world do not exist or are only starting to now appear in the market. Hence, skilled individuals that understand how to configure these tools most likely don’t yet exist. Thus, the likely outcome will be that businesses will attempt to manage the post-Cloud Operating Model world with the same knowledge and tools they use to manage the pre-Cloud Operating Model world, which will, unfortunately, fail. It is in this failure that I believe the leaders of how to operate a post-Cloud Operating Model world will emerge.

 

Special thanks to @cpswan and @glenprobinson for their assistance in helping me shape my ideas for this blog

A Reality Check on “Everyone’s Moving Everything To The Cloud”

A recent CIO editorial by Bernard Golden regarding the future of private cloud spurred some interesting commentary in my network. The pushback seemed to focus around the viability of the term “private cloud”. These individuals are well-respected thought-leaders in cloud with significant experience guiding senior IT executives transition to modern architectures, so I decided I’d engage them in a discussion regarding the future of self-managed infrastructure as a whole.

There’s a class of individuals, myself included, whose job it is to advise and recommend direction in Enterprise IT. In this role, we live in a bubble of sorts. I believe most of us understand the realities of Enterprise IT very well, but we also have a good handle on next generation architectures, their value, and how to migrate current IT environments to these next generation architectures. We understand the constraints of public cloud environments as well as other as-a-Service offerings. However, being outside of the day-to-day operational demands of the business, it’s easy for us to “see the light” more easily than someone who is confronted daily with typical Enterprise IT demands.

This bubble also means that we need to check ourselves from time to time against reality and question what we consider to be the optimal solution for businesses. For me, this meant engaging what I consider to be some of the most intelligent and knowledgeable individuals on cloud computing and Enterprise IT. Individuals who also meet with key senior executives in IT on a daily basis to discuss their goals to engage in discussions about balancing the vision and reality.

What ensued was a lively discussion that uncovered realities to balance the hype for businesses migrating to the cloud.

The Participants

The list of individuals I asked to participate is a real Who’s Who in cloud computing: Randy Bias (@randybias), Reuven Cohen (@Ruv), Tim Crawford (@tcrawford), Bernard Golden (@bernardgolden), Sam Johnston (@samj), Tom Lounibos (@Lounibos), George Reese (@georgereese), Christian Reilly (@reillyusa), Glen Robinson (@GlenPRobinson), Chris Swan (@cpswan),  Mark Thiele (@mthiele10).

The Discussion

The initial ask was to provide two pro and two cons for businesses to continue investing in their own managed infrastructure (data centers or co-location). As the conversion ramped up this group was also requested to provide their insights in businesses moving “whole hog” to the public cloud.

I’ll summarize the responses to the initial request first.

With regard to support for businesses needing to continue to manage and continue investing in their own infrastructure:

  1. Data Gravity (scale of data to voluminous to easily migrate the apps and data to the cloud)
  2. Security
  3. Emerging scalable solutions (e.g. Hyperconverged)
  4. Lack of equivalent SaaS offerings
  5. Significant integration requirements
  6. Lack of ability to support migration to cloud
  7. Vendor licensing
  8. Network latency
  9. Transparency
  10. Avoid Lock-In

With regard to support for business exiting infrastructure/data center management activities:

  1. Operational costs
  2. Availability of skilled workforce
  3. Better capital management
  4. Difficulty in providing elastic scalability
  5. Agility
  6. Perception of being considered pro-data center
  7. Limiting innovation
  8. Poor imitation of public cloud experience
  9. Poor capacity management / resource utilization
  10. Avoid extremism

Interesting, huh? Across a large enough population of really talented and experienced individuals emerged ten reasons for each position. You can argue the viability of certain ones, but all-in-all it’s an interesting outcome given the hype about companies moving to public cloud and CIOs all claiming they’re looking to exit the data center business.

I reiterate one of Tim’s messages, “IT is complex!”. We should know by now that there’s no silver bullet solutions, but attempts to buy them have ended in failure time and time again. And, yes, the message that the future world will be a hybrid one is starting to become more prevalent. Yet, there’s still a growing perception that self-managed data centers/infrastructure is a bad thing and businesses should have a strategy for exiting their current investments. Of note, the answer doesn’t have to be “public cloud”, a reasonable solution is to offload this to a managed services provider.

Notables

If you’re looking to learn from the best, here’s a set of quotes taken from the discussion thread. It’s an incredible resource chock full of knowledge and experience from some of the world’s most knowledgeable IT experts.

  1. Sadly, I’ve run across two business leaders recently who believe that the cloud is fragile and concerned about security and the grid going down (yikes!).”
  2. I’m not of the camp that believes all roads lead to Public Cloud. It is one of many tools that an organization can leverage in their arsenal. If we do move everything to these three (from the myriad of existing data centers…or internal providers), we will in-fact be building a house of cards. A perfect target for malicious and disruptive activities too
  3. I suspect we’re picking on some philosophical differences in our believe of where things end. If you believe that all roads end with Public Cloud, you will have one perspective. If you believe in a more complicated (and I hazard to use the word hybrid) reality, then you will have a different response. There is both an opportunity component and a time component.
  4. I’ve heard:
    1. Come hell or high water (heard it’s a great movie) we’re just going to move everything regardless of what it takes
    2. We’re going to keep primary applications in house and secondary ones in public cloud
    3. We’re going to use a retirement approach as apps are grandfathered out their replacement is “cloud first”.
    4. Little to no public cloud
    5. All on AWS
    6. , etc.. I’ve heard each of the above from good sources, along with the indication that this wasn’t a one-off sentiment.
  5. There has never been a good “monopoly”
  6. Moving “all” to public cloud is a tough nut to swallow for most. There are a significant number of hurdles that the average IT organization must overcome before they can safely and successfully make that kind of move; Is there a well-defined value to making an immediate move? Do we have the internal resources to accomplish the move while not significantly disrupting on-going operations and development? Can we justify having all our eggs in one basket? If we don’t put all our eggs in an AWS or Azure basket, how do we safely, securely and efficiency manage a multi-cloud portfolio?
  7. There are modern organizations that have already spent over three years in attempting their move of “all” to a specific cloud provider and they aren’t done.   Where there is less understanding among customers is what their future looks like when they no longer own any of these traditional infrastructure based services. Will they be beholden in a bad way to a less than benevolent dictator? Will changes in service quality affect their ability to effectively support their customers? Can they have confidence in their ability to manage costs? In other words, “we’ve moved the brain out of the body, do we really know everything we need to know about how that will work long term?”
  8. I expect that the long term is higher order abstractions that run 90/10 on public/private systems and are relatively standardized such that businesses can focus on where business value is in their IT supply chain.  Meanwhile, some folks will always need to go their own way because they are pushing the envelope (Hi DropBox).
  9. In regards to ability to deal with scale IE keep it staffed, the hyper scale guys will suck up all the talent, think data gravity but more like “skills gravity”. Can they handle the scale IE number of machines/facilities, these guys are good at what they do because of awesome amounts of focus on the supply chain
  10. I’ll put this out there……. Who wants to be the CIO who signs off the next PO for a new DC?? Isn’t this just career suicide?
  11. Here’s my issue with modern DC & enterprise-focused software today. Excluding Salesforce and Amazon, most of the major enterprise players today built their software and systems 25+ years ago in an age where security was an afterthought (at best), the internet didn’t exist within the context DC operations and the majority of employees didn’t have or need access to company resources 24/7.
  12. For mission-critical systems that are built on “iterated upon” architectures and technologies – and require extreme integrations with existing internal and external systems, investing in modernizing or adding HA/DR capabilities (yes, I’m looking at you Aviation & Retail Banking) is also (IMHO) a perfectly valid reason to spend $$. In many cases and many industries, there are simply no SaaS solutions to purchase, and moving to public cloud IaaS could arguably make systems even more complex to operate. The risks are simply not worth it.
  13. Think about all the compute power across just the Fortune 500. Now take all of that and dump it on Amazon, Microsoft & Google. Are we expecting too much from them? Will they be able to handle this? Can they staff enough people to run data centers large enough to continually deliver elastic resources to this audience?
  14. You have to hire high-skilled IT talent in a low-value activity and are prevented from spending that money on talent in other areas that could improve your business outcomes, e.g., application developers
  15. Most CIO’s (I know) are pragmatists, while they have a view on the “horizon” they are tactically mixing and matching their DC strategy “service by service”.
  16. Like any real estate, it’s location, location, location for datacentres. Think of them like gated communities; you want to have great neighbours you can interconnect with, not ones you want nothing to do with, nor an industrial estate with a small number of large wholesale footprints, and certainly not an empty estate. You should look for carrier-neutral facilities with plenty of telcos, as well as internet, cloud, financial, and other exchanges.
  17. You can hug your servers. Regulators like that, but change is the stroke of a pen away (and coming soon to a jurisdiction near you). Gartner reckon “no cloud” policies will be gone within a few years.

Summary

I have to admit I was completely surprised by the level of participation in this thread. This particular group is very busy with their daily business activities and was thoroughly pleased with what we were able to create. I am grateful they took the time to add their knowledge and experience to this query. Their Twitter handles are all listed above and I highly recommend you follow.

At a time when the media presents sensationalistic views that skew reality, it’s critical to stop and assess what is being presented. Unicorns are great, but they’re mythological beasts. Can you learn from them? Sure. Can you be them? Perhaps a better question is, “should you be them?” The following sums this up best in response to releasing the draft of this entry for review:

“What we’ve done here really demonstrate the “It’s complicated” statement, which is the truth of the matter. The “Netflix poster child” for cloud adoption is not representative of most others cloud journey. The low hanging fruit are on the cloud, now the real work begins. (not to undermine the great work the early adopters did)”

What Will You Do With What You Learn From IoT?

When the Internet of Things (IoT) started to emerge as a popular topic, I had to stop and ask myself if I was once again going to provide commentary on this emerging field. I enjoy exploring new technology shifts and illustrating how they can benefit various industries and businesses. It’s what I’ve done for the past 20 years through Java, XML, Web Services, SOA, Cloud and DevOps. However, every time I started writing on IoT I seemed to run into the same conundrum; am I commenting on this to jump on the hype bandwagon or because I see a need to represent the pragmatics of implementing and adopting this technology.

There is no question that more sensory inputs can lead to greater understanding. The ability to monitor a thing with fine granularity facilitates greater learning. It’s why we have research studies that have extended over multiple decades. It is the basis of the scientific method enabling systematic observation and measurement of phenomena. We probably have only begun to explore the boundaries of what we might learn through these observations and this growing sensor network. In most cases today IoT focuses on observation of a single “thing”, but ultimately we will learn of patterns that occur due to one “thing” impacting another “thing”. These chains of events have the capability to drive innovation that till now has only been described in science fiction. It’s Schroedinger’s experiment applied on an infinite scale.

There’s some pragmatic applications of this technology that can be applied to a whole host of industries: healthcare, manufacturing, energy, civil planning, physical security, etc. In fact, many of these applications have been in production for decades. The military has been using sensors in battlefield scenarios since 2003. The energy sector has been using sensors to monitor oil pipelines since 2005. Most of these used proprietary protocols operating over low-bandwidth transmission mediums, but they effectively deliver the value that we expect from IoT today.

So, part one of my conundrum is the question, “so what’s new?” Systems that provide value typically tend to improve over time. Today, the hardware is less expensive, it’s easier and less expensive to develop handlers for the data, we are leveraging higher bandwidth wireless mediums that allows us to use TCP/IP based (standardized) protocols. But, isn’t this just really an improvement on the existing implementations? Is it worthy of the hype? Perhaps it’s the fact that due these factors we can see a “social” effect. That is, just like Facebook an LinkedIn, as more sensors join the network the value of the network increases exponentially. This is certainly worthy of the hype, but it seems the hype has been centered on the advancements in the technology underpinnings rather than the value of the network being created. Even where there has been focus on new applications for this technology, it’s certainly stuff that could have been produced over the past two decades.

Part two of my conundrum is even more difficult to tackle. What are we going to do with what we learn? Again, today, many of the applications of IoT typically deal with issues of what we’ll call “event processing” for lack of a more encompassing term. That is, an event is occurring and we can capture that event much closer to the event horizon now because we have a sensor providing real-time information. This event also generates data that may be discarded after the event has been processed and handled or it may be stored and cataloged to be analyzed at a future time. Assuming the latter we can start to identify patterns of behavior associated with this event, such as situational awareness–what is happening in the surrounding environment that causes the event to occur–and what are the factors indicative of the event itself–useful in root cause analysis. In the case where the pattern of behavior identified leads to behavioral changes we find ourselves in a precarious position.

Figure 1 below illustrates the issue

IoTBehaviors

Since we’re living with layers of aging and deteriorating systems implementing learned behavioral changes becomes difficult as we have to either apply our learning to the existing systems or make new systems through modernization, refactoring or new development. As you can see by the illustration assimilating these changes becomes easier in a post-IT Transformation environment, but still will fall far short of the overall number of learned patterns which will exponentially grow as the overall sensor network grows.

This brings me back to my conundrum, which is if we’re not able to readily assimilate the new learned behaviors from IoT and the value of IoT is just that it’s gotten cheaper and easier to deliver event processing, is it worthy of the hype or is it on par with advancements in CPU processing? Please don’t get me wrong, I’m not saying that IoT does not have significant intrinsic value, because it does. Businesses that can apply this technology for the betterment of their products and services will see revenue benefits and operating cost reductions. Moreover, these sensors can be aggregated to drive that benefit across the supply-chain. For example, the airline that recognizes that it’s engine is at risk for a breakdown and pulls it out of service before it causes multiple service delays by leveraging the sensors provided by the engine manufacturer and aggregated into their sensor grid for the entire aircraft. In the biz, we call this a solution. If the airlines had asked, we could have delivered this solution sooner, but perhaps they didn’t particularly care for the associated price point. Hence, proving IoT has business value at a given price point. This is economics 101, but again is it hype worthy?

To summarize, I do believe if IoT advances to the point where it’s essentially an intelligent network that is generating and exposing new behavioral patterns that we cannot easily discern today because the data is absent, then IoT will have earned it’s hype-worthiness. Much the way we have learned much about human behavioral patterns through the use of social networking, “things” impact each other and the opportunity to observe that phenomena will pretty much reshape the entire planet. However, as an event processing engine that helps us park our car better?

Modernization or Transformation: Which Is The Right IT Fix For Your Business?

Much of IT terminology is often misused and misapplied. Modernization and transformation are two such terms. They are often used interchangeably even though they mean different things and have very different connotations. Indeed, it is somewhat safe to assume that in IT any transformative effort is likely to also have a modernizing effect, and thus, we can see these as levels of improvement efforts. However, many businesses are being led to believe if they don’t transform now they risk becoming irrelevant when they would be equally well-off to simply modernize existing IT services saving millions over transformation.

I often witness solution architects using the term transformation with regard to their designs, and upon review I see that the solution doesn’t affect how the IT organization is operating. Ultimately, they have new tools and modern capabilities, but the same “towers” that are there now remain in tact in the to-be architecture. In these cases, I question the architect, “where is the transformation?” The transformation should result in a fundamentally different working organization post-transformation than when it started, otherwise, it’s simply a modernization effort.

To be clear, there’s nothing wrong with a modernization approach if the business is achieving it’s goals with the current IT systems and processes. I’m working with one such client right now that is struggling with service management quality issues. We don’t need to transform the client’s data center environment to succeed, we simply need to modernize and consolidate some of the tools and approaches in order to incorporate more automation and more predictive monitoring. Alternatively, I asked Tim Crawford, CEO, AVOA, and well-respected CIO advisor, to provide an early review of this blog to wit he pointed out that it’s also important not to adopt a mantra of modernize when transformation is required or to avoid the complexity that transformation entails. That is, all too often, IT organizations use modernization as a replacement for undergoing a longer and more expensive transformational effort meanwhile the cycle of deterioration of their systems continues.

A great analogy that captures this home renovations. If the structure of the house is sound, but the style of the home is dated, then modernization efforts will update the look of the home without replacing critical infrastructure. For example, replace carpets with laminates, update kitchen and bath with new appliances and fixtures, and provide a new coat of paint. Essentially, the home space may be more maintainable and support a simplified lifestyle, but the flow and layout are still same.

In contrast, transformation incurs changing the physical structure of the home. Perhaps tearing out a wall to enlarge a living space, adding an extension or finishing an unfinished space. In each of these examples you’re changing the use and flow for how the house is lived in.

There are extreme variation in costs between these two approaches. Modernizing a home can be done for a fraction of the cost for a transformative effort, while still providing great enhancements to the living situation. And, the same is mostly true with regard to delivering IT services. For example, if you provide compute infrastructure on an application-by-application basis and you switch to cloud computing, you are undergoing a transformative approach, however, if you are already using virtualization software and you move to cloud, the way you think about resource management, governance, deployment, etc. all remain fairly similar, but there may be a need to learn new tools to support these processes.

So, when is transformation required over modernization? When change becomes too high of a risk, then modernization is no longer an option. Many businesses are caught in this pattern right now where they’d like to be more agile and responsive to business requirements, however, are limited by a multitude of factors that unknowingly work together to lock IT in this state of immovability. Examples of these factors include:

  • A culture of fear to contribute (or keeping your head down)
  • Failure results in severe penalization
  • Loss of too much tribal knowledge with no documentation
  • Year over year reductions in IT budgets
  • Business stopped paying annual maintenance on software and/or running software versions that are no longer supported by vendor
  • Forced to extend end of life for hardware beyond reasonable limits
  • Little to no investment in training and/or modern IT skilled individuals

While many academically talk about undergoing IT transformation or are one of the small percentage of inspirational success stories, there are tens, if not hundreds, of thousands of businesses that live this existence everyday. To make matters worse, the executives of these businesses are now being told by the market that they need to “digitally transform” or they may be gone in ten years. In turn this pressure to transform is then cast upon the already immovable IT organization.

In these cases, modernization should be considered “putting lipstick on a pig.” The key business processes and underlying systems of record have become so complex and difficult to change that changes are enacted usually only once annually and the period for actual development on these systems is shortened significantly due to the need to ensure thorough testing.

One may wonder, couldn’t these businesses institute a parallel transformation effort? Sure, which is from where recommendations, such as Gartner’s Bi-modal strategy, emanate. However, I view Bi-modal strategies as an approach to take when between a proverbial rock and a hard place. They are required because nothing else will work given the current operating environment. Moreover, it would require that someone be empowered to lead a parallel effort, which is unlikely given the current conditions. Unfortunately, for many of these businesses, they will be required to have a “rock bottom” moment as motivating event to invest in transformation.

If Everything is DevOps, Then Nothing is DevOps

Tom Healy of Jama Software published a blog entry entitled, “DevOps is Dead, Long Live DevOps” as did Andrey Akselrod and also Nir Cohen. Interestingly, I find these pieces to have related concerns but different reasoning. Clearly, there are naysayers within the IT community that don’t buy into the DevOps fanboi messaging; and that’s okay!

Personally, I believe the IT industry is notorious for taking a good concept, like DevOps, and twisting and contorting it until it fits a consistent model that highlights tooling (product) over outcomes, specialization over generalization, and engineering prowess over simplification. We have seen this same pattern occur with Java, Service Oriented Architecture (SOA), Cloud Computing, and now we see it with DevOps. Ultimately, the downside negative impact is that we lose the support and interest of business as these efforts become more and more technical in nature. Moreover, it feeds an ever increasing cost of operations for IT and minimizes the available pool of candidates for jobs.

DevOps is already well-along the path of being molded to the model outlined above. From the outset has been plagued with dissention that embodies my aforementioned characteristics. The first major point of dissention was about the viability of Enterprise DevOps—was the nuances about the needs of DevOps in the enterprise different than those of startups. Debate is good, when it helps to hone a new concept for the benefit of all. But, this was not a healthy debate with much of it devolving into personal attacks on individuals’ credibility as a means of one perspective being perceived as “the right way!”

As Nir points out in his presentation, linked above, the label DevOps is being attached arbitrarily to roles, jobs, tools, etc. Again, if everything is DevOps, then nothing is DevOps. Each day DevOps become more strongly molded in the aforementioned model and further from the reason the DevOps conversation arose in the first place. That is, we are losing sight of the value that could be derived from recognizing and correcting the hurdles and problems that exist in currently delivering and running systems for the purpose of running a business.

What are those problems again that are limiting our abilities to be perfect (as a goal, not a requirement) in our delivery of IT services to the business?

  • Lack of automation means more human intervention and greater chance of error and outages
  • Lack of collaboration between those who build and those who operate leads to greater opportunities for inconsistencies between environments and outages
  • Lack of automation and collaboration in addition to out-of-date policies limits the ability for IT to respond to business needs faster

There are some learned practices that seem to greatly help some IT departments respond to these problems. There are some tools that improve configuration management and allow automation of complex, distributed environments. There are tools, policies and procedures that help to improve collaboration and communication between those who build and those who operate. There’s still a need to manage up to help senior management understand that this is not solely a “below the waterline” problem and needs executive sponsorship to succeed.

Let’s not allow DevOps become so mired in a need to be molded to some idea of a geek fantasy project that we lose sight of the good that come its origins. Allow the knowledge of and experience of those that have mastered moving their business to a more agile and high-quality environment guide your own efforts, but don’t get caught up in whether or not it’s “DevOps”. Are you addressing the key problem areas identified above? If so, then your business will benefit from the improvements in IT service delivery.