An Introduction to Monitoring Hybrid Clouds
Containerization is more than just a trendy topic nowadays. With the rise of hybrid cloud environments and the ever-expanding adoption of containers, it’s important to start thinking about how to maintain visibility in these environments. Monitoring hybrid clouds presents many of the same challenges any hybrid environment would, but with the many moving parts that come with containers, there are additional challenges to take on as you begin a transition.
If you’ve worked with containers and microservices before, this information isn’t exactly new, but if you’re just starting to move into the containerized world we would suggest checking out this article from Medium which breaks down virtual machines, docker, and containers in detail for beginners. There’s also a great history of where containers came from, and how they became commonplace in IT infrastructure.
The Challenges of Monitoring Hybrid Clouds
Monitoring any hybrid environment presents certain challenges; even a hybrid of several on-prem data centers can be tricky. But containers add a new layer of complexity to maintaining visibility.
By now, many monitoring companies are used to the challenges of an on-prem/cloud hybrid. Things like auto-scaling instances and cloud provider-specific platform services are good examples of the kinds of challenges that monitoring has learned to handle to maintain proper visibility within hybrid environments. Many of these services and features required monitoring practices to adapt. Hybrid cloud environments are no different.
When moving to containers, the timescale in which instances exist is much shorter and the dependencies between objects increase. This makes containers incredibly adaptable and allows for faster development and deployment, but it also means there’s a larger number of objects to monitor. Which brings us to the significance of the “pets vs. cattle” analogy.
Monitoring Hybrid Clouds and Pets vs. Cattle
The greater volume of smaller objects with shorter lifespans not only involves more moving parts that need to be understood on an individual level. It also requires an understanding of the “pets vs. cattle” concept.
Previously, servers were given individual names and manually configured. This meant they were often treated like “pets” and considered to be irreplaceable and indispensable. Containers operate on a very different system, using an array of numbered servers, and are designed to fail or be replaced regularly. So, they’re considered “cattle”.
Because these “cattle” servers are designed to be replaced, it means monitoring systems need to keep up with those changes without losing visibility on either the old server or the new one replacing it. Container management systems like Kubernetes, Mesos, and Swarm have systems in place to automatically rebuild or replace any of the microservices running on a server that is destroyed, even if a server’s destruction was unplanned. The auto-healing nature of containers is part of what makes them easily scaled and ideal for quick development and deployment.
However, it’s also important to remember that despite this auto-healing nature, many container management systems can only handle issues like dying nodes or pods and won’t be able to inform you if an application begins returning errors or slowing down. While Kubernetes comes with some basic monitoring, it’s not uncommon that the errors that can occur at an app level aren’t being caused by the items Kubernetes automatically monitors.
This makes visibility crucial when monitoring hybrid clouds, and it requires a monitoring service to have visibility behind the firewall, within the cluster itself, and even on individual nodes. More in-depth monitoring is necessary. Even if Kubernetes does some monitoring on its own, once you have more than one cloud, and potentially more than one provider you’ll want a third-party monitoring service. Otherwise, you’ll need to log into several different platforms to configure alerts and other settings.
Maintaining Visibility When Monitoring Hybrid Clouds
Ensuring full visibility in these environments requires dynamic monitoring solutions. Monitoring that includes coverage within individual nodes and clusters better represents container and application performance, as well as the performance of the cluster as a whole. An environment with multiple clouds requires monitoring within each cloud your containers live on and additional monitoring outside the private clouds to ensure availability and end-user experience won’t lapse.
Apps will generally produce useful metrics about health and performance the same way they have since the bare metal days. What containers actually change is the route that monitoring needs to take to collect that data.
The container management system you’re using plays a big role in determining how your monitoring will work. As stated previously, Kubernetes has self-healing functions, and if a node were to die, it can use a ReplicaSet to bring the Pods and applications that were running on it back to keep your apps running, as long as you have your services set up correctly.
This is why visibility on nodes is necessary. Luckily, many monitoring services have some form of custom pod which can be installed on each node in a cluster to route the data to a second pod within the cluster used to collect the data. Often, the second pod will only live on one node within a cluster and is used to send data to the monitoring service so it can be exposed via a dashboard to the user. This way, you will have up-to-date information about the health of the apps in addition to the health of the cluster.
For example, each Pod has an individual IP Address even if they are still on the same node, so you will need a way to automatically reconcile the new IP addresses if a node is replaced. The services you configure on your scheduler or master will be able to handle this sort of thing, and it will also be able to add any necessary monitoring apps onto new nodes it creates to replace old ones. This will also allow your monitoring system to scale and expand as new nodes are added or removed as necessary.
Looking at an example monitoring setup for a Hybrid Cloud
The above diagram visually represents where each component of the monitoring systems sits within an example Kubernetes cluster. Panopta calls their container and application monitoring pod the Panopta Agent. The Panopta OnSight is the pod which gathers the data before sending it off to Panopta for building a dashboard for one of our customers.
This is just how we architect our Kubernetes and Cloud Native monitoring, The Panopta Agent is an extension of our normal monitoring agent, which includes application-level metric collection as part of the standard package, regardless of infrastructure. For Kubernetes, and for other container environments such as Docker, we adapted the robust monitoring architecture used by hundreds of customers to containerization.
Monitoring when using Kubernetes
If we continue to use Kubernetes as an example there are some basic items that you will need to monitor:
- CPU Utilization
- Memory Usage
- Disk Usage and I/O Performance
- Network Bandwidth
- Pod Resources
- All Infrastructure components (Master, Nodes, Pods, etc.)
- Internal Services/Resources
This isn’t an exhaustive list, and it will need to be tweaked for your use case, but it’s a place to start when it comes to monitoring hybrid clouds. All these metrics will give you useful data about the health of both your cloud and apps. In addition, an effective alert system will let you know when any of your nodes experience an issue but won’t alert you about false outages or changes that are within the thresholds you set.
When you have multiple clusters across several different clouds, your monitoring will need to expand and change with it. Whether you’re using Kubernetes to manage your containers or you’re working with another system, an effective and robust monitoring system helps prevent service disruptions.
If you’re interested in learning more about monitoring hybrid clouds, or if you’re considering transitioning to a container-focused environment, subscribe in the sidebar to get updates about this conversation.