The Case for DBMS Live Patching [Extended Version]

Michael Fruth,Stefanie Scherzinger
2024-10-14
Abstract:Traditionally, when the code of a database management system (DBMS) needs to be updated, the system is restarted and database clients suffer downtime, or the provider instantiates hot-standby instances and rolls over the workload. We investigate a third option, live patching of the DBMS binary. For certain code changes, live patching allows to modify the application code in memory, without restart. The memory state and all client connections can be maintained. Although live patching has been explored in the operating systems research community, it remains a blind spot in DBMS research. In this Experiment, Analysis & Benchmark article, we systematically explore this field from the DBMS perspective. We discuss what distinguishes database management systems from generic multi-threaded applications when it comes to live patching. We then propose domain-specific strategies for injecting quiescence points into the DBMS source code, so that threads can safely migrate to the patched process version. We experimentally investigate the interplay between the query workload and different quiescence methods, monitoring both transaction throughput and tail latencies. We show that live patching can be a viable option for updating database management systems, since database providers can make informed decisions w.r.t. the latency overhead on the client side.
Databases
What problem does this paper attempt to address?
The paper attempts to address the issue of implementing live patching in Database Management Systems (DBMS) to reduce service interruptions caused by system updates. Traditionally, when DBMS code needs to be updated, it usually requires a system restart or switching workloads between hot standby instances, which leads to client service interruptions. The paper proposes a new method that modifies the application code directly in memory through live patching, eliminating the need for a system restart, thereby maintaining memory state and all client connections. Specifically, the paper explores the following points: 1. **Feasibility of Live Patching**: Investigating how to update the code of a running DBMS without interrupting the service. 2. **Specificity of DBMS**: Analyzing the differences between DBMS and general multi-threaded applications in terms of live patching and proposing specific strategies for DBMS. 3. **Injection of Thread Quiescence Points**: Proposing a method to inject quiescence points in the DBMS source code, allowing threads to safely migrate to the patched process version. 4. **Performance Evaluation**: Evaluating the impact of live patching on transaction throughput and tail latency through experiments to ensure that database providers can make informed decisions regarding client latency overhead. The main contributions of the paper include: - **Static Preparation Code**: Investigating how to statically prepare code in database connection management, particularly for two common connection management strategies: one thread per connection and thread pool. - **Safe Thread Quiescence Method**: Proposing a priority-based thread quiescence method to ensure that threads within the thread pool can migrate safely and without deadlocks. - **Experimental Validation**: Validating the effectiveness and performance impact of the method through live patching experiments on two open-source DBMS (MariaDB and Redis). Overall, the paper aims to explore and validate the application of live patching technology in DBMS, providing a more efficient and less disruptive update method for database management systems.