Enabling Storage Business Continuity and Disaster Recovery with Ceph distributed storage

Enrico Bocchi,Abhishek Lekshmanan,Roberto Valverde,Zachary Goggin,R. De Vita,X. Espinal,P. Laycock,O. Shadura
DOI: https://doi.org/10.1051/epjconf/202429501021
2024-05-06
EPJ Web of Conferences
Abstract:The Storage Group in the CERN IT Department operates several Ceph storage clusters with an overall capacity exceeding 100 PB. Ceph is a crucial component of the infrastructure delivering IT services to all the users of the Organization as it provides: i) Block storage for OpenStack, ii) CephFS, used as persistent storage by containers (OpenShift and Kubernetes) and as shared filesystems by HPC clusters and iii) S3 object storage for cloud-native applications, monitoring and software distribution across the WLCG.The Ceph infrastructure at CERN is being rationalized and restructured to allow for the implementation of a Business Continuity/Disaster Recovery plan. In this paper, we give an overview of how we transitioned from a single cluster providing block storage to multiple ones, enabling Storage Availability zones, and how block storage backups can be achieved. We also illustrate future plans for file systems backups through cback,a restic-based scalable orchestrator, and how S3 implements data immutability and provides a highly available, Multi-Data Centre object storage service.
What problem does this paper attempt to address?