We think it’s important to call attention to great DevOps and monitoring content from around the web and we came across this one: “What’s not Actionable & Business Critical Shouldn’t Ring: Building the Right Alerting System” by Fred de Villamil over at t37.net and wanted to add some of our own thoughts.
Although this article is from a few months back, it brings up something that our customers always bring up: too many alerts! While monitoring systems are mission critical for any infrastructure and DevOps team today, there’s a lot to be desired in their implementation. When alerts are set up to be sensitive, or your monitoring solution gives false alerts, the only thing that follows is alert fatigue. It’s the classic “Boy Who Cried Wolf” problem: after a time, your monitoring solution becomes your monitoring problem. We’re proud to use a few different solutions internally so that we’re constantly looking at ways to improve Panopta so that we can stop false alerts and eliminate alert fatigue internally and for our customers.
Why is this so important? Just read Fred’s article or take our word for it: it affects your bottom line! The method that Fred and his team used to intelligently sort what infrastructure needed higher alert thresholds and which ones needed lower ones is dead simple but it worked. They slashed their after hours on-call time and got time back.
Need help figuring out how to set intelligent thresholds or need a monitoring solution that doesn’t give you false alerts? Consider this an invite to give Panopta a try.