It’s that time of year again, winter’s cold is over and it’s time to do some spring/summer cleaning across many facets of your life! If you’re responsible for keeping your infrastructure online, I recommend you do the same for your monitoring environment.
We recently did an infrastructure audit for one of our clients and found that there were definitely some areas that could use some improvement. With a few simple changes, we were able to streamline their setup and improve overall operations.
Based on the audit, here’s our top five recommendations of things to focus on.
1. Review your notification and contact settings.
Make sure that you have the right people configured for alerts, and have their correct contact information. As phone numbers change or people switch smartphone platforms, notification configurations need to be updated or you will end up missing alerts. People also switch roles and responsibilities through the year – make sure that escalation paths are updated to reflect this.
If you are a managed service provider, make use of auxiliary notification schedules to send alerts directly to your customers as well as your internal staff. You should also setup time to review contact information with your customers – it’s even more likely that you’ll have missed updates there and your proactive focus could win you props.
2. Look for “extra” servers.
A company’s monitoring setup serves as a master list of its infrastructure. In reviewing yours, you might find several servers that are no longer in active use but could still online. Not only do these add to the complexity of your monitoring, but they also cost resources to maintain.
Clean up and shut down any infrastructure you no longer need. If you find servers that are still in production use but could be consolidated, queue up tasks to clean things up – you’ll not only save yourself money but will cut your overall energy use as well.
3. Review your infrastructure for unmonitored servers.
After you’ve removed any unused servers, do the reverse and make sure that any servers and other devices that have been added in the last year are setup in monitoring. Even with the best deployment checklists you can miss a step – check now to make sure that you don’t miss a problem down the road because some servers weren’t appropriately setup.
4. Setup maintenance schedules for planned downtime.
In any complex environment you’ll likely have regular maintenance work that is done, either through automatic processes or by hand. If this is triggering alerts that go to your team that don’t require any actions, you should setup maintenance schedules to silence the alerts.
For example, the client’s database replicants had been triggering daily alerts when their offsite backups ran and temporarily delayed database replication. This was a known issue that their operations team expected and ignored because it wasn’t a “real” problem and would resolve itself as soon as the backups completed. To avoid this, they put in place a maintenance schedule to turn off notifications during the backup window. If problems persist after the window timeframe, they will get notified and can take action.
Remember, alerts should only be sent when there’s an actual problem that needs to be addressed. Sending alerts that people know to ignore is the first step to missing a critical outage and suffering from real downtime!
5. Enhance your setup with new monitoring features.
We continually add features to our monitoring system in order to provide more complete monitoring and better alerting and reporting functionality. We showed the client several new tools that they were not taking advantage of. Alerts for their core infrastructure now are sent to their operations staff through Google Chat messages as well into their internal IRC chatroom. They also added rotating contacts to properly route after-hours emergency alerts to our on-call staff, rather than waking everyone for every problem.
Take a look through our previous blog posts and announcements in the control panel for a summary of what’s been added, and feel free to contact our Support Team if you have questions or need assistance setting up something new.
The time is now!