This post is part of our six part series on DNS. The complete list is here: Part 1: DNS Basics, Part 2: DNS and Performance, Part 3: Common Problems and Solutions, Part 4: Best Practices for Setup, Part 5: Monitoring an Anycast Service, Part 6: The Importance of Highly Available DNS.
The last two weeks, we dove into a DNS primer along with how DNS impacts the performance of your services. If you’re well versed in these topics, read-on! If not, you might want to start with those articles first. (Part 1: A Primer, Part 2: DNS and Performance)
This week we’re jumping back in and doing a quick survey of some common DNS problems and solutions. As we’ve established in our previous posts, DNS often times does not get the operational attention it requires, despite its critical role in making your infrastructure available. In addition, because of the nature of DNS and how it leverages caching/delegation, fixes can’t always be done in real-time.
Let’s get right into it.
Problem #1: Relying on a Single Authoritative DNS Provider
“Don’t put all your eggs in one basket.” This saying is true in so many areas and it also applies to how you’ve configured your authoritative DNS. If you’re only using one provider, you are susceptible to your entire infrastructure being unavailable if that provider goes down. In fact, this happened not too far back with half the internet being down due to an attack on DynDNS. Read about it here.
Wondering how much of the internet is still susceptible? We ran the numbers and they’re not great. Out of the top 1 million domains on the Alexa listings, only 35% use multiple providers! 65% are still running a huge risk. Companies have been assuming the reliability of a single DNS provider for too long and it plays out in our data.
This simple chart illustrates the problem:
This puts the problem into even starker reality. Of the top 1 millions domains, almost 72% of them are only using one provider.
If you’re reading this and you’re in this category, do the research on other providers and set them up as a secondary provider. A good place to start is Cloudharmony’s availability report on DNS providers.
Problem #2: Assuming Performance Doesn’t Matter
As we discussed in the first 2 DNS posts, DNS plays a critical role in nearly every round trip request to your servers. Therefore, despite it being milliseconds, performance really matters. In addition to using multiple providers, it’s important to be aware of how your providers have performed historically and how they’re performing now. Be sure to monitor them!
We’ve been lucky enough to provide CloudHarmony with their monitoring data for quite some time and they have a great report on the performance of DNS providers (the same one mentioned above) that’s worth taking a look at before you decide who the right fit is for your company.
Problem #3: Not Paying Attention to Security – DNSSEC
DNS spoofing is a common attack mechanism used to redirect users of a particular caching nameserver system to incorrect IP’s in an attempt to capture user information.
For many years, while encryption schemes like SSL/TLS have progressed, DNS has not. In the last 5 years, a standard called DNSSEC has been adopted by most modern and addresses this critical security gap. DNSSEC uses cryptographic digital signatures.to verify authenticity of the response through the entire chain .Be sure to check with your authoritative DNS provider to ensure this is enabled for your zone.
Problem #4: Using the Wrong TTL Settings
TTL, or time-to-live, are really important for each record in your DNS zone file. They’re essentially used to instruct downstream caching servers how long they should keep their records in memory before reaching back to the authoritative nameservers to check to see if there’s been any updates.
Using the wrong TTL setting can prevent customers from accessing updated services in real-time or they can provide a slow experience. A TTL which is set too high has an impact on your agility to move services when needed. A TTL which is set too low causes caching nameservers to refresh more often which means performance is impacted. Each record type (A, AAAA, CNAME, MX, TXT) deserve to be treated separately when deciding on how long you want to set your TTLs.
For some more detail on common settings for each of these record types, check out this post from the blog over at Dyn.
Problem #5: Lack of Visibility into DNS Problems
You’d be hard-pressed to find a company these days who has absolutely no monitoring in place. It’s a standard IT practice. What isn’t uncommon though is having gaps in your monitoring strategy. Users often monitor the obvious assets/endpoints, like their websites, servers, network edges etc.. Once again, DNS often gets left behind. For each of your corporate zones, be sure you’re monitoring the following:
- Internet latency to your authoritative DNS servers
- Ensure those DNS servers are accepting connections and responding to DNS queries
- Measure the round-trip DNS response time regularly and often (high frequencies)
- Ensure the responses to the DNS queries are accurate (the right IP’s are being returned)
- Make sure you’re monitoring is testing IPv6 connectivity and using DNSSEC verification
Remember, if they go down, you go down.
When it comes to monitoring, of course, we’re partial to Panopta.