DataFed: Towards Reproducible Research via Federated Data Management

Dale Stansberry,Suhas Somnath,Jessica Breet,Gregory Shutt,Mallikarjun Shankar
DOI: https://doi.org/10.48550/arXiv.2004.03710
2020-04-08
Abstract:The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed -- a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment.
Databases,Computers and Society
What problem does this paper attempt to address?