Monitoring vs. Observability

Cindy Sridharan (@copycontruct on Twitter) recently published an article that’s really worth taking a look at. In it, she goes into some detail on the conversation around “monitoring” and “observability”. While to some it may just seem like a debate over semantics, Cindy brings up some interesting points about the marketing of the term “monitoring” and how in the age of “DevOps” and other contested terminology that “monitoring” just isn’t sexy anymore.

There may be a case for the argument over terms but part of the problem is traditional, blackbox monitoring and whitebox monitoring. The primary difference between these two being that the former only monitors from the outside and the latter also monitors inside to pull in system-level metrics. The way we see it, it’s the difference between a reactive monitoring model that only responds to failing or down systems and a proactive model that offers greater insight into potential failures so that they can be mitigated before affecting the end-user experience. The old way of monitoring only uptime just isn’t enough anymore.

Regardless of terms though, Cindy calls for something that I think everyone can get behind:

As strange as it might sound, I’m beginning to think one of the design goals while building systems should be to make it as monitorable as possible — which means minimizing the number of unknown-unknowns.

Building with this in mind doesn’t mean that you’re off the hook for monitoring certain metrics but that you need a monitoring system that can ingest as much as possible without becoming fatiguing, providing the right context and, ultimately, monitors not just for failure but also for performance.