Monthly Archives: October 2016

The Anonymous Neighbor Problem in IT

COMED, my power company, sends out a monthly report that shows me my energy consumption relative to my neighbors. Every month I’m considerably higher than all my neighbors. This report also has a list of things I could do to reduce my energy consumption. The problem with this report is that it doesn’t take into account my house size relative to the other houses in my neighborhood. My house is the largest model and accounts for about one-third the local sample size. If I were to attempt to reduce my energy consumption so that I would be in line with the monthly average of my neighbors, I would probably have to adopt a lifestyle akin to the Amish. Hence, I call this the Anonymous Neighbor Problem and, I believe, is responsible for driving decisions that are thrust upon IT leadership from executive business managers.

Executive business managers receive reports from analyst and management consulting firms that compare their business expenditures on IT to other “similar” firms. The businesses surveyed in this report are in a similar industry and roughly the same size. These are the anonymous neighbors. The thing this report does not, and cannot, take into account is the history your particular business. Perhaps your business is older than all your neighbors, hence, it’s a much more complex and expensive process to migrate to lower cost operational alternatives. Perhaps these same executives now asking why it costs so much more than your neighbors to operate IT decided that they didn’t want to approve the necessary capital expenditures five years ago that would have led to a more efficient IT operational footprint. Finally, there is most likely some waste as every IT organization has some inherent waste.

Unfortunately, for many CIOs, when these reports are released all heads turn to them and ask them to get in line with their anonymous neighbors. I point back to the drastic measures that would be required of me to meet cutting energy goals to be in line with the average of my neighbors, but for many CIOs they cannot ignore the mandate. The results have been devastating for many IT shops. Loss of institutional knowledge, deep cuts in labor resulting in inability to keep up with necessary maintenance cycles, and use of outsourced vendors that are not properly managed requiring additional policing are all examples of what this Anonymous Neighbor Problem has led to in IT. Moreover, it leads to lower quality of service levels from IT fostering a cycle of questions as to costs versus service provided.

Having been on the side of sales and consulting to IT for many years, I am not a fan of the, “my needs are unique” or “you don’t understand our business” answers. I’ve written before about the cost of uniqueness in IT and I do believe uniqueness can be designed out in favor of commodity business capabilities, which will can significantly lower IT operating costs. However, this does not invalidate that the current state IT environment for a given business many not be in position to achieve the goal of meeting the average IT spend when compared to its anonymous neighbors without drastic measures that impair the value that IT can deliver to the business. Achieving this goal will require IT transformation, which may incur investment before the business can expect to see reductions in IT operational costs.

Hard Choices Are Required When Adopting a Cloud Operating Model

When was the last time you’ve ever heard anyone say “IT Applications & Operations”? Frankly, in my 30+ year career in IT, I don’t believe I’ve ever heard anyone use this term. The typical term we hear is IT Infrastructure & Operations. These two go together like Peanut Butter and Jelly, which tells us a lot about how we view the field of IT.

For those that may not be familiar with the role of IT Operations, Joe Hertvik does a great job here of describing IT Operations Management as someone engaged in the role of providing this service to the business. As you can see it’s very interesting how he specifically addresses the gap between responsibilities regarding IT Applications and IT Operations as a Venn diagram in which there is no overlap.

However, as we progress from a pre-Cloud Operating Model world to a post-Cloud Operating Model world, this coupling is changing. As we migrate workloads to the cloud, the first shift will be to operating infrastructure to operating Infrastructure-as-a-Service. This shift will leverage capacity management and monitoring skills used with virtualized environments.

COMShift

 

The second shift will come as migrated workloads are refactored into cloud native applications or new cloud native applications are developed specifically for cloud. Here the emphasis of operations will focus on the management and monitoring of the application platform. This may be done using a Platform-as-a-Service offering, such as Azure PaaS, AWS, Cloud Foundry or OpenShift, Software-as-a-Service platforms, such as Salesforce.com or ServiceNow, or even something more traditional based on traditional application servers.

During this second shift, operations will need to focus less on availability and more on consumption patterns. Given the nature of cloud native applications to support greater availability and resilience (see Pets & Cattle presentation by Randy Bias ) through design, emphasis will need to shift toward how services are being consumed in order to determine efficiency and costs management. Greater integration between the operations management platforms and the cloud services will be a critical requirement for this shift to occur.

The final shift will happen as businesses move toward Serverless Computing or Function-as-a-Service. In this model, business logic will execute in response to an event occurring in another service. Due to the temporal nature of this model, operations management and monitoring will change drastically. The application will only be available to monitor for brief periods requiring new techniques for operations in support of the Serverless Computing model. Failures that occur here may only be recognizable by performing analytics on post-execution logs.

The Impact of the Cloud Operating Model Shift

Having presented a view of the world post-Cloud Operating Model, you may ask what’s the impact of this roadmap on today’s traditional IT environment?

As operations is designed today they focus on running the physical and logical environments, which means a significant number of resources are focused on running the devices that run the network, compute and storage in addition to specialized software for monitoring and managing the various components. It also means that the overall budget has had to be divided across managing the physical environment and the applications, with the physical environment usually taking the lion’s share of the budget. This is just the nature of the beast as production applications tend to demonstrate a lower mean-time-to-failure and require less attention than its physical counterparts. Moreover, the physical environment is constrained by procurement cycles and capital expenditure approvals that is not characteristic of the applications.

As shown in the diagram above, as businesses move to away from self-managed infrastructure—it’s not just to cloud, but all infrastructure provided as-a-Service—the focus of IT operations will shift more of its focus to the workloads and the applications. Simultaneously, operational focus shifts away from running a physical environment as this task falls to the infrastructure providers. Hence, what we should expect to see going forward is operations being “unbound” from infrastructure and bonded with applications. As more and more businesses realize the economic benefits of relinquishing ownership of their infrastructure to a provider, IT should reorganize around operations of the applications and workloads.

This unbinding and re-binding process for operations is a key element for successfully implementing a cloud operating model. The results of this activity is that the remaining self-managed infrastructure organization needs to become more encapsulated. It will need fewer resources and should consider combining operational support with infrastructure engineering. This should result in lower operational overhead and significant reduction in IT costs.

This change will also raise every red flag you can imagine for those that have been in infrastructure and operations for the better part of their careers. The fiefdom holders, the server huggers, the CCIEs, vExperts, the tinkerers, the storage priests, etc. will demonstrate strong resistance to this change.  They will introduce fear, uncertainty and doubt whenever possible. They will regale stories of cloud failures and breaches. All of this is an attempt to maintain the status quo for as long as possible in light of this disruptive force. It is here that executives must weigh the pros and cons of this change and be the driving force behind moving to this new IT organizational construct.

Additionally, what we’ve seen in many early adopters of cloud is the applications move without its operations counterpart. Eventually, someone asks the question, “where’s my dashboard?” or “how is this integrated into our ITIL processes?” Thus, as we unbind operations from infrastructure and bind it with applications, what is being monitored and how its managed changes, but we carry across, and, hopefully, correct high-latency low-value, processes to support audit, transparency, and corrective action. Some existing operations skills will transfer to this new focus, however, there will be a change in the tooling used for these tasks and, likewise, different skills will be required to operate cloud-based and hybrid applications.

The biggest issue for business will be how to adopt and implement operational management as their workloads shift away from infrastructure and toward applications. There is some early guidance from the application performance management vendors, such as Dynatrace and AppDynamics, and examples from Webscale startups, but as whole, this segment of the industry is unwritten.

We know that key performance indicators that are important today will be different in this post-Cloud Operating Model world. Also, it is very likely that the tools needed to monitor and manage applications in this post-Cloud Operating Model world do not exist or are only starting to now appear in the market. Hence, skilled individuals that understand how to configure these tools most likely don’t yet exist. Thus, the likely outcome will be that businesses will attempt to manage the post-Cloud Operating Model world with the same knowledge and tools they use to manage the pre-Cloud Operating Model world, which will, unfortunately, fail. It is in this failure that I believe the leaders of how to operate a post-Cloud Operating Model world will emerge.

 

Special thanks to @cpswan and @glenprobinson for their assistance in helping me shape my ideas for this blog