We love talking geek and if you’re reading this – you’re probably one of us. We wanted to make it clear to all the geeks (and non-geeks) exactly how Panopta works and how it makes your life easier. If what you want to know isn’t covered here, feel free to give us a call and we can talk geek to geek.
How Does Panopta Work?
Panopta is setup to provide complete monitoring coverage of your infrastructure, regardless of the location. We do this by combining multiple monitoring approaches into one consistent, comprehensive system:
- External Monitoring of network services on publicly-accessible servers from our global array of monitoring nodes
- Internal Monitoring of servers that are not directly accessible from the public Internet with the Panopta Monitoring Appliance. This allows network services and system resources to be monitored for servers that are behind firewalls and not accessible to the public internet
- On-server monitoring of system resources such as CPU and memory usage with the Panopta Monitoring Agent from both internal and external servers
Together, these three components give complete coverage to your entire network system and enables us to develop a view into your infrastructure that is both deep and wide.
As results are generated by any of these components, they are securely fed to our centralized infrastructure, which handles alerting and reporting.
Looking more closely at the processing that takes place in the Central Infrastructure gives a better view into what makes Panopta unique.
The first step in this processing takes place at the Results Collector, which receives and organizes the monitoring data flowing in from the monitoring nodes, agents and appliances. It sends a copy of all data to the Monitoring History Data Warehouse, where we store a permanent record of all monitoring history for future use.
The Results Collector also performs an initial analysis to decide what data is “interesting”, and passes that data on to the Outage Correlation Engine. The Correlation Engine is responsible for performing the deeper analysis needed to compare the results coming in from different sources and applies proprietary outage confirmation logic, which eliminates the possibility of false alerts by using time- and geographic-based confirmations to ensure the system has an accurate view of our customer’s servers. More details of this process can be found below in the “Life-cycle of an Outage” section below.
Once an outage has been confirmed, it is passed to our Notification Engine, where all alerts are scheduled and distributed according to the notification schedule that has been configured for the affected servers. The Notification Engine is also responsible for the more advanced aspects of intelligent handling of alerts, such as consolidating multiple alerts together into a single message, application of device dependencies to rise more critical alerts to higher priority, and acceleration or pausing of escalations. The Notification Engine then dispatches alerts to a wide range of communications channels including email, SMS, voice phone calls, chat and Twitter.
The final major component of the central infrastructure is our Reporting Engine. Reports are generated and sent via email on a daily, weekly and monthly basis, with support for customized content designed to provide customers with exactly the data they need to maintain their operations. Reports are also available on demand through our Public Reporting functionality, which allows availability data to be styled to seamlessly fit into customer websites and dashboards.
The Lifecycle of an Outage
One of the most important aspects of Panopta’s infrastructure is our proprietary system for confirming outages, which eliminates false alerts and guarantees accurate results for the customers that are depending on the system.
The diagram below shows how this process works:
During normal operations, monitoring checks are performed for the primary monitoring location of each server, typically chosen to be geographically near the customer’s server to minimize the impact of intervening network delays and problems.
Results are sent back to the monitoring node from the customer’s server where they are processed by the monitoring node.
Failed results are sent back to the central infrastructure.
The Outage Correlation Engine schedules confirmation checks to be performed by a number of additional monitoring nodes that are located near the customer’s server.
Checks are performed by each confirmation monitoring node, typically within 15 seconds of the original failed check result.
Results are returned to each monitoring node.
Results are passed back to the central infrastructure, where the Outage Correlation Engine combines them to determine that an outage has truly occurred.
Alerts are sent by the Notification Engine according to the timeline configured for the server.
The entire process repeats in reverse once successful checks start to be returned, with all clear alerts being sent out after all monitoring nodes have confirmed that service has returned to normal.
Addressing Your Concerns
All of Panopta’s core infrastructure is replicated across multiple datacenters, with hot standbys ready and waiting to take over in the event of failure. Real-time data replication ensures we can fail-over when needed without missing a beat. In more than four years of operations, we’ve never had a disruption in our monitoring, not even after 20+ billion checks.
At Panopta we take security very seriously in all aspects of our operations. Communications between all pieces of our systems are encrypted SSL transfers, including the control panel, API operations and all agent- and appliance-related communications. This approach guarantees that no monitoring information can be intercepted or misused in any way.
Furthermore all agent and appliance-related communication is initiated from the device itself. To use either of these, you do not need to open ports for outside connectivity, meaning they can not be used for initiating attacks on your systems.
Finally, the source code for the monitoring agent is open for review and inspection by any of our users so you can know exactly what you’re adding to your servers. We have nothing to hide and welcome the scrutiny.
Handling complex configurations
Our system is designed to handle the most complex of infrastructures – if you’re managing it, we can monitor it. We have:
- A control panel designed to manage hundreds of servers, with support for logical groups of servers that can be managed as a batch
- An API supports automated configuration and status queries, allowing you to automatically setup monitoring as physical or virtual servers are deployed
- A notification system that gives plenty of controls to capture the complexities of real-world operations teams
No service lock-in
At Panopta know that you’ll find our service to be something you can’t live without. However, in case things don’t work out we want to make your transition as painless as possible. To support this, we offer the following:
- No long-term contracts – use the service on a month-to-month basis and cancel at any time if you’d like
- You own your data. Just ask and we’ll export your monitoring history for you
- Free trial – put the entire system through it’s paces and compare with your current monitoring system before ever entering your credit card information