Taurus Database: How to be Fast, Available, and Frugal in the Cloud

Alex Depoutovitch,Chong Chen,Jin Chen,Paul Larson,Shu Lin,Jack Ng,Wenlin Cui,Qiang Liu,Wei Huang,Yong Xiao,Yongjun He
DOI: https://doi.org/10.1145/3318464.3386129
2024-12-04
Abstract:Using cloud Database as a Service (DBaaS) offerings instead of on-premise deployments is increasingly common. Key advantages include improved availability and scalability at a lower cost than on-premise alternatives. In this paper, we describe the design of Taurus, a new multi-tenant cloud database system. Taurus separates the compute and storage layers in a similar manner to Amazon Aurora and Microsoft Socrates and provides similar benefits, such as read replica support, low network utilization, hardware sharing and scalability. However, the Taurus architecture has several unique advantages. Taurus offers novel replication and recovery algorithms providing better availability than existing approaches using the same or fewer replicas. Also, Taurus is highly optimized for performance, using no more than one network hop on critical paths and exclusively using append-only storage, delivering faster writes, reduced device wear, and constant-time snapshots. This paper describes Taurus and provides a detailed description and analysis of the storage node architecture, which has not been previously available from the published literature.
Databases,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: how to design an efficient, highly available and low - cost multi - tenant database system (Taurus) in the cloud environment to meet the needs of modern enterprises for relational database as a service (DBaaS) on the cloud. ### Specific problems include: 1. **High Availability**: - The deployment methods of traditional database systems in the cloud environment cannot provide sufficient high availability. For example, when some of the multiple hosts fail, the entire database may become unavailable. - Taurus improves the availability of write operations while ensuring data persistence by introducing new replication and recovery algorithms. Specifically, Taurus can achieve availability comparable to or even higher than that of the Aurora system with six replicas with only three data replicas. 2. **Performance Optimization**: - In distributed systems, network interaction is one of the key factors of performance bottlenecks. Taurus significantly improves performance by reducing the number of network interactions on the critical path through a series of architectural innovations. - Experiments show that the throughput of Taurus is 200% higher than that of MySQL 8.0 using local storage, and the latency of read replicas remains within 20 milliseconds under high load. 3. **Cost - effectiveness**: - Traditional databases waste a large amount of resources in the cloud environment, such as bandwidth, CPU cycles, memory space, etc. Taurus reduces device wear and storage costs by separating the computing layer and the storage layer and adopting the append - only method. - Taurus also reduces network load and latency by optimizing the separation of log stores and page stores, further saving costs. 4. **Scalability**: - Traditional databases need to create complete database replicas when expanding, which is not only time - consuming but also expensive. Taurus realizes automated horizontal scaling by means of a distributed architecture and on - demand resource allocation, and can support a database scale of up to 128TB. ### Summary Taurus aims to provide a high - performance, highly available and cost - effective cloud - native database solution by redesigning the database architecture and making full use of the advantages of the cloud environment.