Aocms: an Adaptive and Scalable Monitoring System for Large-Scale Clusters

Zhenghua Xue,Xiaoshe Dong,Weiguo Wu
DOI: https://doi.org/10.1109/apscc.2006.34
2006-01-01
Abstract:In this paper, we present the design and implementation of AOCMS, an adaptive, scalable and efficient monitoring system for a large-scale cluster. We describe an adaptive architecture of AOCMS in detail, and focus on the discussion about some techniques as to enhancing the adaptation, scalability and efficiency of AOCMS. These techniques include: a solution to monitor a heterogeneous cluster; a universal applet-servlet communicating controller responsible for communication between the clients and the web server; adaptive pools providing threads or connections to the database for the monitoring tasks on demand; and an AOP-based alarm decoupling the alarming logic from the monitoring logic. Moreover, we measured the performance of AOCMS. The results show that AOCMS runs with low overheads and responds to clients quickly.
What problem does this paper attempt to address?