Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure
Arnab K. Paul,Steven Tuecke,Ryan Chard,Ali R. Butt,Kyle Chard,Ian Foster
DOI: https://doi.org/10.1145/3149393.3149402
2017-01-01
Abstract:As research processes become yet more collaborative and increasingly data-oriented, new techniques are needed to efficiently manage and automate the crucial, yet tedious, aspects of the data life-cycle. Researchers now spend considerable time replicating, cataloging, sharing, analyzing, and purging large amounts of data, distributed over vast storage networks. Software Defined Cyberinfrastructure (SDCI) provides a solution to this problem by enhancing existing storage systems to enable the automated execution of actions based on the specification of high-level data management policies. Our SDCI implementation, called Ripple, relies on agents being deployed on storage resources to detect and act on data events. However, current monitoring technologies, such as inotify, are not generally available on large or parallel file systems, such as Lustre. We describe here an approach for scalable, lightweight, event detection on large (multi-petabyte) Lustre file systems. Together, Ripple and the Lustre monitor enable new types of lifecycle automation across both personal devices and leadership computing platforms.