Over the past week, DNS-related outages have impacted a number of sites. LinkedIn, the third largest social network in the world, suffered several hours of downtime where their site was instead being directed to domain sales page. After investigation, they determined that this was due to a domain hijacking incident with their DNS provider, Network Solutions. Network Solutions claims “a small number” of its clients were affected including LinkedIn. Cisco offers contrary evidence that up to 5000 webpages were affected by Network Solutions outage, including USPS.com, Subaru, Mazda USA, US Airways, Craigslist, and Weather.com.
Following the Network Solutions problem, hosting providers Zerigo and GoDaddy have had extended DNS outages from what appear to be DDoS (Distributed Denial of Service) attacks that have taken many of the websites that they host and manage offline.
DNS is key component of how the internet works, and normally plays a behind-the-scenes role and doesn’t get much attention. However, as the last week has shown, it can cause serious problems for online businesses. What should you know about DNS and how to minimize the impact of DNS problems on your site?
Today, we will start by stepping back and explaining the functional parts of the internet and how your webpage, web applications, and all that other coding fits in to the Domain Name System.
What is the internet
“[The internet] is a series of tubes.” – Alaskan Senator Ted Stevens in 2006
This famous quote by Senator Ted Stevens has been memed and boo-hooed by the media as a sign of how disconnected lawmakers are from the way things actually work. But, honestly, do you know how the internet, in it’s most macroscopic overview, is organized? who or what organizes these so-called “tubes”?
Today, we will look at the basic cornerstone of the internet: the Domain Name System (DNS for short). It belies almost every single interaction you have had with the internet this morning, yesterday, and likely your entire web presence. And, as is the case with Brookstone (noted below) it can mean your website is offline while your own servers hum along perfectly.
Fundamentals of the internet
The fundamental organizing structure of the internet is the Internet Protocol (IP) address. An IP address is a label for any possible device connected to any type of network whether it is a printer, computer, smartphone, or server. For simplicity sake, we will be talking about IP addresses as they relate to the public internet. The IP address’ main function is to give label and identity to servers and end-user computers. Each IP address is a unique number between 1 and 4,294,967,296, for simplicity sake they are written in the form of four separate numbers joined by dots, like 220.127.116.11.
But as you know from experience, you don’t often type in digits in to your web browser to bring up a webpages. Instead, you use a domain name like www.panopta.com.
These domain names are paired with IP addresses so that information can be remembered by us, human beings, simply via names. Web browsers then “translate” these domain names back to an IP address.
This “translation” process, however, is not a decryption of numbers in to letters; rather it is a correspondence between a series of servers and IP addresses.
Roots and Authorities
From a high overview, this correspondence begins with Internet Corporation for Assigned Names and Numbers (ICANN) and their Root name servers.
The root name servers are a series of servers located across the world containing, redundantly, all of the registered domains on the internet. They are the heart and origin point of the internet with an archive of all domain names. These root name servers index all registered domain names in to grouping servers labeled A through M. Those lettered groupings outsource the work of knowing the exact IP address to registered Authoritative name servers.
Authoritative Name servers are registered with ICAAN as the certified source that the root name servers must use to find an IP address for a specific domain. But they are maintained by private individuals and companies outside of ICAAN.
An authoritative name server is a service that you pay for that hosts your current IP address and domain name. Because of the simplicity of the type of infomation stored on an authoritative name server, many IP address and domain name pairings are stored on single authoritative name server together.
The authoritative name server is the pivotal part in the DNS that gives authoritative answers (the correct IP Address) to queries of any browser in the world searching for your domain. These authoritative answers come in the form of Fully Qualified Domain Names (FQDNs). A FQDN is a domain name that is exact and understands the distinction between www.panopta.com and my.panopta.com. Only authoritative name servers know all of the possible FQDNs; a root name server only has registered the base domain name, like panopta.com.
As a testament to their importance, these authoritative name severs always have several redundant servers containing the same information to ensure if one of the name servers goes down there are others to back it up. It is also in part to ensure that the entire DNS infrastructure does not collapse because each of these Authoritative name severs are a key rally point for internet users.
This system with the Root Name servers and authoritative name servers, however, is not how all your internet interactions occur. Instead, your interactions, in general, are much more efficiently run by a caching server.
What is Caching
A caching server stores a small dictionary (think: abridged dictionary) of IP address and domain name combinations whose contents are determined by local usage and frequency. For example, Twitter is banned in China and their micro blogging needs are filled by a service called Sina Weibo. Sina Weibo, though it is used by over 368 million users in the world, will likely not be found on caching server used by web users in Honduras.
DNS caching servers work as a localized server that hosts a small but efficient curated compendium of the most important FQDNs for their region. They exist for you, the end-user, to access information with minimal lag time by avoiding round-trip requests to root and authoritative name servers.
These caching servers are maintained by a variety of different groups including Google, local internet providers, and hosting providers. They are operated in order to reduce lag time for user based on the general assumption that people in a given physical area are going to browse similar web pages.
Another critical element of DNS caching servers is that they are constantly looking to update themselves and eliminate webpages that are no longer operating, think pets.com, and bring webpages that are getting more attention in to their caching list. They do this by using their own resolver mechanism and their user’s queries to query root name servers to locate a new set of IP addresses and Domain Name combinations
When DNS goes wrong
Given all the pieces that make up the Domain Name System, it should be clear that there are various ways that things could go wrong. If you want to keep your website available, it is important to prepare for these problems. Fortunately, there are a few ways monitoring can save you some future headaches.
To get started, let’s walk through the different problems that could occur. There are two main problems that can occur in DNS that are beyond your control. But you ought to know about them in order to keep in mind what can go wrong.
At the most catastrophic level, root name servers could go down. This is would be a internet wide problem and be considered something that would bring down most of the functional uses of the the internet. This event is extremely unlikely, given that there are worldwide copies of the root name servers some of them distributed by anycast routing for additional redundancy.
A local caching server could break,
Beyond those two problems, there are other preventable problems. An outage with authoritative name servers can be very frustrating for you because it will disable access from your clients and set you off on a wild goose chase with your other servers. Caching adds to the complexity of determining these outages, as your view through your local caches will be different than other visitors. Directly monitoring authoritative name gives you better insight into the state of your authoritative servers and allow infrastructure decisions to be made quickly and accurately.
DNS outages are relatively frequent occurrences, based on our aggregate monitoring data roughly 5% of all outages are DNS-related. Just the other day, Brookstone, a publicly traded gift retailer, had a prominent DNS outage. For over an hour, neither of Brookstone’s two authoritative DNS servers (NS97.WORLDNIC.COM and NS98.WORLDNIC.COM, which are provided by Network Solutions) were returning IP addresses. This took their entire site, as well as email and any other services they rely on, offline for that entire hour.
A more troubling problem, the configuration in their authoritative name servers could be changed without your knowledge. Authoritative name server’s can be targeted by someone (read:hacker ) and your FQDN could be directed to a different IP address,
What you should be monitoring
Fortunately, both of these situations can be detected and handled with appropriate monitoring of your authoritative DNS servers. Our DNS checks support full queries to resolve a given FQDN and compare the IP address that is returned against one or more correct addresses. By setting up checks to each authoritative server to ensure that your domain name can be resolved, and that it is returning the correct IP address, you can be alerted whenever there are problems and can then work with your DNS provider to resolve the problem.
DNS lookups are part of the entire process that is performed when performing an HTTP check, and if your DNS servers are having problems they will eventually show up in failed HTTP checks. Be aware, most monitoring systems make use of local caching servers when performing HTTP checks. Because of this, HTTP checks may not detect an authoritative DNS outage for some time depending on the caching settings for your domain. Doing separate DNS-specific checks against your authoritative servers is critical to maximize your site’s availability.
Because the authoritative name servers are run redundantly, each of the those servers should be checked separately so that you can detect problems with any individual server. However, because of this redundancy it’s not necessary to trigger immediate, urgent alerts when one authoritative server goes down. This is an important problem that should be addressed, but doesn’t necessary waking up someone in the middle of the night to address it immediately.
However, simultaneous outages of all of your authoritative servers is a very critical problem, which needs an immediate response. Through the use of our compound services, which generate a separate set of alerts when outages occur across multiple servers, you can handle this situation as well.
If you aren’t actively monitoring your DNS servers currently, we recommend you set up checks now to avoid future problems. If you have questions on how to best configure monitoring for your DNS servers or need any assistance setting things, feel free to comment below or email us.