Orb: Scalable collection and real-time analysis of critical system data Timothy R. Palko 18-845, Internet Services, Spring 2014 The current trend of advancing wireless, environmental monitoring technologies is central to the focus of Critical Infrastructure Protection (CIP). Communications networks, transportation systems, and medical facilities are only a few settings for which this development is a concern. The focus of this protection could be a physical device or equipment crucial to the operation of these services or it could be increasing awareness of environmental conditions and how those conditions fluctuate in comparison with expectations. Examples are excesses in temperature or humidity around sensitive equipment, vibrations in structures beyond their known thresholds, or the presence of people in a room or in a hallway during off-hours. Meaningful oversight of these factors and the ability to respond effectively to them requires coordinated data collection, deterministic behavior on failure conditions, and real-time visibility of the data being collected. We present an introspection and evaluation of such a system, Orb. Orb is a secure, distributed system of data collection nodes supported by real-time analysis and notification services. This system aims to deliver highly scalable and secure data collection on which feature-rich and responsive applications for real-time visualizations, reporting, and notifications may be built. The physical deployment of Orb is a network of wireless sensors and sensor gateways designed for deployment at insecure sites and capable of the secure aggregation of environmental data. Internally, Orb provides services for stable data collection, visual representation of the data, and monitoring for notifications and incident response. The collection services receiving the data include a fast, in-memory, key-value store for recent data, a large, ondisk repository for heavy analysis of older data, and a migration service, which is responsible for managing the data on these storage resources. For incident response additional services continuously inspect the data, evaluating user-defined rules for real-time notifications and visualizations. A complex network of services that Orb is comprised of requires thorough inspection and testing to ensure it can transmit, store, and serve data efficiently and securely. In this paper, we assessed Orb’s quality of service and evaluated its effectiveness under realistic and stress loads. The goal for Orb was to scale to thousands of nodes without loss or delay in service. While the small format of the data in transmission, lack of data persistence in the external network, and a preference to in-memory storage for the initial intake benefit Orb’s operation, after thorough evaluation, two bottlenecks were identified: one in the archival of data, and another in the evaluation of rules for notifications. We discuss those performance limiting design features along with the overall design and operation of Orb and identify the realistic limits of its scale.