Long Live The Image: Container-Native Data Persistence in Production

Zheng Li
DOI: https://doi.org/10.48550/arXiv.2103.02397
2021-03-03
Abstract:Containerization plays a crucial role in the de facto technology stack for implementing microservices architecture (each microservice has its own database in most cases). Nevertheless, there are still fierce debates on containerizing production databases, mainly due to the data persistence issues and concerns. Driven by a project of refactoring an Automated Machine Learning system, this research proposes the container-native data persistence as a conditional solution to running database containers in production. In essence, the proposed solution distinguishes the stateless data access (i.e. reading) from the stateful data processing (i.e. creating, updating, and deleting) in databases. A master database handles the stateful data processing and dumps database copies for building container images, while the database containers will keep stateless at runtime, based on the preloaded dump in the image. Although there are delays in the state/image update propagation, this solution is particularly suitable for the read-only, the eventual consistency, and the asynchronous processing scenarios. Moreover, with optimal tuning (e.g., disabling locking), the portability and performance gains of a read-only database container would outweigh the performance loss in accessing data across the underlying image layers.
Databases
What problem does this paper attempt to address?