An interesting facet of advancement is that it tends to limit know-how over time. At one time the mechanic in your local garage could take your entire engine apart, fix any problem and put it back into running order. Today, they can put a computer on the end, read a code, and hope it provides some understanding of the problem. Similarly, the application server revolution has unfortunately led to a generation of software developers that believe that the container can do all your instrumentation and metering for you, thus removing an impetus to build this into the application. These developers lose sight that the container can only tell you part of the story; the part it sees as an observer. If you don’t make more internals of your application observable, the most it can see is how often it hands off control to your application, what services your application uses from the container and when you application terminates control. Useful information, but not enough to develop large-scale real world services used by hundreds of thousands, or even millions of users.
I ran across a product the other day doing some research that I found very interesting. It was a service bus that had built in metering.
Putting the metering into the container is one way around the problem of every applications metering for itself, however, how many people are going to trust their services to this product company? Additionally, it would be very difficult to migrate this service in the future unless the metering interface was standardized or replicated in another hosting container. So, here’s a place where standardization of service offerings in service containers would be a very helpful feature; and something that needs to be the focus of Cloud interoperability.
Additionally, proper inclusion of metering and instrumentation interfaces into you Cloud service means that you have greater control over the granularity of the data produced through these interfaces. Most metering solutions I’ve seen today are very coarse-grained and often focused on infrastructure usage, such as cost per CPU cycle or gigabyte. What if you wanted to produce a service that had tiered pricing? This would require that those service operations provide the details of each transaction to a metering service.
Instrumentation is even more complex to manage than metering as it has the potential to introduce processing latency. If you capture too much instrumentation information then the service will not be responsive, but if you don’t capture enough, you may not be able to determine a pending outage until it occurs. I have seen first-hand the problems of poorly-designed instrumentation in a high-volume service.
In my first-hand experience, traditional system network management tools were used to watch the application performance including CPU, memory and storage, but were unable to determine that a process was out of control and eating up storage space quickly enough to shut it down before it completely brought down all services sharing that volume. In an elastic Cloud environment, where dynamic storage is configured to come online to minimize outages a process like that could eat up a large percentage of your storage array before you had the chance to unwind it. This was made even more complex by the fact that the service provider was unaware of the typical size of a transaction from each of its customers, so it could not assume a particular quota since that may limit a legal request.
These are real problems that could have been mitigated by an appropriate level of metering and instrumentation built into the services themselves. These sub-services could have inferred average transaction sizes and average storage usage and identified that a particular process was out of range in enough time for a human to get involved and stop the runaway process.
It’s interesting to see so much attention being paid to security in Cloud Computing since this was one of the most often ignored components for those implementing SOA-based designs. It seems that the immaturity of a particular technological direction is easily identified by what aspects of a robust, mature solution are ignored in emerging implementations. To ignore metering and instrumentation in the design of your Cloud services, or believe the infrastructure will handle this for you, to me illustrates the immaturity of Cloud Computing and a nativity on the part of those implementing Cloud-based services.
2 thoughts on “Metering and Instrumentation: Two Critical and Oft Forgotten Features of Cloud Services”
You are describing quite advanced automatic alarm generation. Who is supposed to configure and tweak all of these monitoring rules ? Would it be the developer (that knows the system design) or operations (that gets a flood of false alarms) ?
Let’s take an online email service. It is normal for it to take more time to process emails with large attachements, right ? How would this be configured? Does operations need to learn how to configure each application seperatley ?
This has been solved to some degree for sometime in Java runtimes via JXInsight’s Probes technology which is an dynamic instrumentation resource metering solution that can be used in or outside of the cloud and across the complete application/service lifecycle.