The Simons Observatory: Alarms and Detector Quality Monitoring

David V. Nguyen,Sanah Bhimani,Nicholas Galitzki,Brian J. Koopman,Jack Lashner,Laura Newburgh,Max Silva-Feaver,Kyohei Yamada
2024-06-20
Abstract:The Simons Observatory (SO) is a group of modern telescopes dedicated to observing the polarized cosmic microwave background (CMB), transients, and more. The Observatory consists of four telescopes and instruments, with over 60,000 superconducting detectors in total, located at ~5,200 m altitude in the Atacama Desert of Chile. During observations, it is important to ensure the detectors, telescope platforms, calibration and receiver hardware, and site hardware are within operational bounds. To facilitate rapid response when problems arise with any part of the system, it is essential that alerts are generated and distributed to appropriate personnel if components exceed these bounds. Similarly, alerts are generated if the quality of the data has become degraded. In this paper, we describe the SO alarm system we developed within the larger Observatory Control System (OCS) framework, including the data sources, alert architecture, and implementation. We also present results from deploying the alarm system during the commissioning of the SO telescopes and receivers.
Instrumentation and Methods for Astrophysics,Cosmology and Nongalactic Astrophysics
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to ensure the health status and data quality of each subsystem in the Simons Observatory (SO) during the observation process, and respond to possible problems in a timely manner to avoid hardware damage, data loss or waste of observation time. Specifically, this research aims to develop an efficient alarm system that can monitor the status of key components such as the telescope platform, detector read - out system, refrigeration system, power supply system, etc., and generate and distribute alarms when these components are outside the normal working range. ### Key Problems and Solutions 1. **Ensure the normal operation of observation equipment**: - SO has more than 60,000 superconducting detectors distributed on four different telescopes. In order to ensure that these detectors and their related hardware (such as the refrigeration system, power supply system, etc.) are in a normal working state during the observation process, it is necessary to monitor the status of these devices in real - time. - The paper describes monitoring non - detector - related hardware devices such as computers, SMuRF read - out systems, refrigeration systems, half - wave plates, power supply systems, platform control systems, timing systems, environmental conditions, etc. through the Housekeeping (HK) data set. These data sets are used to evaluate the health of the entire system. 2. **Ensure data quality**: - The quality of detector data is directly related to the reliability of observation results. Therefore, in addition to hardware monitoring, it is also necessary to monitor the data quality of detectors. - The paper mentions using two main data quality indicators: one is to monitor whether the detector is in a superconducting transition state based on SMuRF HK data; the other is to evaluate the quality of detector signals through the data processing pipeline and detect and correct abnormal data points. 3. **Fast response and notification mechanism**: - When a component in the system fails or the data quality deteriorates, it must be able to quickly notify relevant personnel so that timely measures can be taken. - The paper proposes an integrated alarm system that can automatically generate alarms according to the set thresholds and send the alarms to relevant personnel through multiple means (such as Slack, email, SMS, phone, etc.). Especially for high - priority alarms, such as high DR temperature, abnormal PTC status, etc., relevant personnel will be notified by phone to ensure immediate response. 4. **System scalability and ease of use**: - As new hardware is added and new requirements emerge, the alarm system needs to have good scalability so that new alarm rules can be continuously added. - The paper emphasizes the design of system scalability and ease of use, including using Grafana for visualization and defining alarm rules, and implementing a flexible notification subscription mechanism through campana. ### Summary The core objective of this paper is to develop and deploy an efficient alarm system to ensure the normal operation and data quality of each subsystem in the Simons Observatory during the observation process. By monitoring hardware status and data quality in real - time and notifying relevant personnel in a timely manner, this system can effectively prevent hardware damage, data loss and waste of observation time, thereby improving the efficiency of observation and the reliability of data.