Unseen Problems Affect Availability

Last week there was an AWS outage but you probably didn’t hear about. To be sure, it was nothing like the S3 debacle of February 2017, but regardless, it caused some performance degradation for a subset of AWS customers.



There was an error in VPC to VPC peering/connectivity that affected EC2 instances ability to communicate to other Amazon services. The twist is that connectivity from the VPC out to the internet was totally fine. This is a perfect example of how in-depth, multi-dimensional and multi-directional monitoring is an important tool for today’s teams. While there wasn’t a ton of news or attention on this relatively short performance degradation, it still affected companies and their customers and the same thing could happen to anyone.

