Updated June 19, 2017
This post is part of our six part series on DNS. The complete list is here: Part 1: DNS Basics, Part 2: DNS and Performance, Part 3: Common Problems and Solutions, Part 4: Best Practices for Setup, Part 5: Monitoring an Anycast Service, Part 6: The Importance of Highly Available DNS.
In our DNS Series we’ve covered the basics of DNS, why DNS is important to performance, common problems and solutions, and best practices for setup. For the last two bonus posts, we’re getting more detailed in two specific areas. For this post we’ve updated one of our more popular posts on how to effectively monitor an Anycast service. Thanks for reading!
Website developers and admins in today’s ever expanding web have a number of solutions to handle high availability, failover, and performance. One of those solutions is called Anycast; Anycast is a routing scheme which you can use to deal with the challenges of serving a global audience. By using a routing scheme like Anycast, you can ensure users are routed to the node closest to them. It also provides fault tolerance in the event that one of your POP’s (point-of-presence) is unavailable.
In the diagram above, you can see how an Anycast scheme works in context of a CDN (Content Delivery Network). In a CDN, website visitors in different parts of the world have their request routed to the server nearest them, though the results are the same. This helps with page load-time and also takes much of the burden off of your origin servers. Anycast is what enables routing requests to the nearest datacenter/network.
However, the benefits of Anycast add significant complications to monitoring. The Anycast POP that you are monitoring will be determined by the location of your monitoring probes. Running the test from a single probe leaves you with a sizable blind spot in that you won’t know about problems with the other Anycast POP’s because you’ll always be routed to the same location. In addition, any outage confirmation which is done by other nodes will likely test the wrong location thus causing the outage to get marked as false.
So how do you embrace Anycast in your architecture but still effectively monitor your resources?
First, determine which POP’s your sites are being served from with your Anycast/CDN provider. You will then need to work with your monitoring provider to understand which of their probes will terminate to each of the aforementioned locations. This is where using a monitoring provider with a wide network footprint and diverse carrier backbone really helps. Upstream providers/carriers are important to mention here because defining which probes terminate to which Anycast POP’s is determined by more than just the geographic location. It’s determined by routing and the networks which the request travel through.
“Safe” Outage Confirmations
If your monitoring system attempts to verify outages from multiple locations (like Panopta does), the outage could get incorrectly ruled out. Panopta uses a outage voting process to verify authenticity of the outage; meaning if one of our nodes detect an outage on a server or website, 3-5 nearby nodes instantly attempt to confirm the outage from other locations. This rules out any local network or server issues with the primary node. If a majority vote is reached, that the site or server is considered down, and we begin the alerting process. If not, we check it again in 60 seconds per the normal schedule. With Anycast monitoring, there is the danger of the monitoring node checking the wrong server because of the scheme. One way to mitigate this is to fine-tune which probes get used for confirmation. You can determine which probes are safe to use with your monitoring provider.
We’ve pre-determined the mappings of our monitoring probes for common CDN providers like CloudFlare and MaxCDN. If you are using either of these providers (or any other provider), feel free to get in touch with our support team in order to determine how to best set up your monitoring.
About Panopta: Panopta provides advanced network and server monitoring for online businesses and service providers. We go beyond providing basic monitoring to give operations teams the tools they need to detect issues before they occur and minimize the impact of outages or slow load time. Contact us with any questions you may have, or sign up for the free trial and see for yourself!