Much has been said and written recently about observability. Sometimes the term is used interchangeably (and incorrectly) with visibility. Many software vendors are using the term observability, but there is little consensus on the definition. Let’s review exactly what observability means.
What exactly is observability?
Observability is a term from control theory that has been borrowed by vendors selling software for IT Ops. From Wikipedia: "In control theory, observability is a measure of how well internal states of a system can be inferred by knowledge of its external outputs."
What does observability mean in the context of IT Ops (and DevOps)? According to Gartner: “Observability is the evolution of monitoring into a process that offers insight into digital business applications, speeds innovation and enhances customer experience. I&O leaders should use observability to extend current monitoring capabilities, processes and culture to deliver these benefits.”1
Some software vendors will try to convince you that observability means you can add a magical layer to your ITOM tools that will enable you to understand what's happening on the inside of the system by simply observing the outside of the system. Why do some software vendors want you to believe this? Because collecting machine data is hard, and it has become much harder with the complexity and dynamic nature of modern IT systems. Collecting events, as Generation 1 AIOps vendors do, is easy. So in an ideal world, a software tool just waits for systems to send events, then analyzes them, then magically spits out insights that resolve problems. This is what those vendors set out to do. The desire to have such an easy solution somehow allows us to look past the absurdity of it.
This is akin to saying you don't need gauges or diagnostic codes from your car — you'll know there’s a problem when it stops running. You might see the check engine light is on, and you'll magically know what the problem is because you can observe the "outside of the system." Apply this to software applications — when the application stops working, you just examine the events and the previous times the application stopped working, and you're supposed to be able to infer exactly what the problem is. The bottom line is that those solutions have unsurprisingly failed to produce results.
Machine learning has come a long way in recent years. But with the vast amounts of data being collected, simply relying on algorithms to identify anomalies isn’t enough. With typical enterprises collecting billions of events per day, Gen 1 AIOps tools are often finding thousands of “anomalies” per day, which obviously isn’t that helpful. The algorithms need more context to deliver true insights. Deploying the right AIOps technology with a service-centric monitoring platform can provide this context, which enables inference capabilities such as true anomaly detection, root-cause analysis and intelligent dashboards.
Why do we need observability?
The answer is simple: Modern architectures have become so complex and dynamic that without observability, it is extremely difficult to identify (let alone prevent) the cause of digital service issues. Just as external data is not solely sufficient, neither is internal data from system components. For modern digital services, observability requires the combination of internal and external data. It is visibility into the gamut of metrics, events, logs, change records, ITSM data and more. The sheer amount of data being collected requires a modern AIOps solution to derive the critical insights that accelerate problem resolution and improve end-user experiences. What does this mean for your business? It means reduced risk, faster deployment of new technologies, and, in the end, better end-user experiences.
Modern Monitoring + AIOps
Modern monitoring platforms have responded to this challenge. Modern platforms have evolved to collect all types of data from all systems, ascertaining digital service structures and dependencies, and then feeding that rich data to machine learning algorithms to derive insights previously unable to be derived. This is true observability.
Check out this white paper to learn more about modern monitoring and AIOps with Zenoss Cloud.
1Innovation Insight for Observability, Gartner, 28 September 2020.