Best Practices for DNS monitoring
In our last post, we discussed some of the best practices for setting up DNS and making sure that it is working correctly, but this week, we’re going to talk about long term maintenance. DNS monitoring makes it far easier to know when you have an issue and where the issue occurred so you can solve it more swiftly and easily.
If you’re interested in not only learning about monitoring your DNS, but how to avoid common mistakes and learn about new DNS features, you can subscribe to this series in the sidebar.
I’m looking into DNS monitoring services. How can I find the best one?
Like picking out an authoritative DNS provider, choosing a monitoring service requires a good deal of research before you get into creating a short list. If you’re unfamiliar with monitoring, we suggest getting to know the industry. Monitoring Weekly is a great place to start since it’s monitoring focused and updates regularly. You’ll even begin to see some of the potential products you might ultimately choose to work with.
If you feel you’re familiar enough with the monitoring industry, then we suggest starting with the key questions we’ve listed below. These are obviously not all-inclusive, and you might need to add to them for your individual use case, but they’re a place to start.
Do they monitor both uptime and downtime?
This might seem like a fairly basic question, but we wanted to touch on it because it’s so important. Nowadays, monitoring whether or not your DNS is up isn’t enough. If you were ever the victim of a DDoS attack, you might not know it until the attack has successfully brought your site down and the DNS is pushing back error messages to end users.
A service that lets you monitor uptimes will often let you set alerts for sudden spikes in use, such as a large number of IP Addresses attempting to send requests to your hostname in an attempt to bring it down. There’s obvious value in being able to know ahead of time in the case of an attack, but monitoring your uptime will also allow you to make your DNS more agile. If there are certain parts of the world or certain times of the day that latency is spiking, your monitoring system will help you determine why it’s happening and how to fix it.
How many probes does the DNS monitoring service have, and where are they located?
When looking at DNS monitoring services, you want to make sure that they have at least a few probes that can access each of your Points of Presence (POPs). This is especially important when you’re working with a Content Delivery Network (CDN) as we’ve discussed previously in our post about monitoring an Anycast service. If you’re only monitoring POPs in one area, you might not know about an outage or, if there’s only one probe to monitor one of your POPs, you might receive alerts for false outages because there is an error with the probe. Which brings us to our next important question:
Does the DNS monitoring service use “safe” outage confirmations?
Safe outage confirmations prevent false positives and help ease alert fatigue by using multiple probes to verify an outage before an alert is sent out. This will also help the reverse case and prevent an outage from going unnoticed.
Having a DNS monitoring service which finetunes this process, and ensures an outage is real before an alert goes out, is essential. Look at how the service maps their probes and handles outage verification to help confirm that it will work well for your use case.
Does the DNS monitoring service offer countermeasures?
With the introduction of more and more automation, countermeasures are a new service, also known as automated remediation, that could play a significant role in who you choose to provide your DNS monitoring. Countermeasures allow you to set up automated systems to react to certain kinds of alerts. This could be anything from setting up a countermeasure to automatically restarting a system if a certain type of error occurs, or even automatically gathering data about an issue so that if a system administrator needs to get involved they will have everything they need to quickly solve the problem.
Automated remediation gives you a big head start towards solving the issues that monitoring identifies.
Does the DNS monitoring service answer the big three questions?
As we said in our last article, knowing how or why an error is happening is a key component of monitoring. When you’re looking into a DNS monitoring service it’s important to think about whether all the metrics, alert functions, and other features will answer the most important questions—Is it up? Is it performing? Is it correct?
Your DNS, or any infrastructure you might be monitoring, isn’t going to be useful if it’s up, but it’s so slow that users aren’t waiting to find out if it’s working. Performance metrics are a necessity today, especially as user expectations skew towards lower latency and higher availability. It’s clear when you’re looking at something as important as DNS you need to know right away that something has gone wrong.
Not monitoring your DNS is one of the biggest errors you can make when you’re setting up authoritative DNS, but it’s surprisingly common for people to put DNS on the backburner or even forget about it once they’ve done an initial setup. Next week, we’re going to talk about other common problems and mistakes related to DNS and discuss the best ways to resolve those issues.
Next in the Series: Common DNS problems, mistakes, and their solutions
Interested in learning more about DNS? Subscribe in the sidebar to get notified when this series updates!