Practical Persistent Multi-Word Compare-and-Swap Algorithms for Many-Core CPUs

Kento Sugiura,Manabu Nishimura,Yoshiharu Ishikawa
2024-04-02
Abstract:In the last decade, academic and industrial researchers have focused on persistent memory because of the development of the first practical product, Intel Optane. One of the main challenges of persistent memory programming is to guarantee consistent durability over separate memory addresses, and Wang et al. proposed a persistent multi-word compare-and-swap (PMwCAS) algorithm to solve this problem. However, their algorithm contains redundant compare-and-swap (CAS) and cache flush instructions and does not achieve sufficient performance on many-core CPUs. This paper proposes a new algorithm to improve performance on many-core CPUs by removing useless CAS/flush instructions from PMwCAS operations. We also exclude dirty flags, which help ensure consistent durability in the original algorithm, from our algorithm using PMwCAS descriptors as write-ahead logs. Experimental results show that the proposed method is up to ten times faster than the original algorithm and suggests several productive uses of PMwCAS operations.
Databases
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper is mainly dedicated to solving the performance and consistency challenges encountered when programming with persistent memory on multi - core CPUs. Specifically, the authors have improved the Persistent Multi - word Compare - and - Swap (PMwCAS) algorithm proposed by Wang et al. to enhance its performance on multi - core CPUs. #### Background and Problem Description Persistent memory is a non - volatile memory. It has byte - addressing capabilities and provides guarantees in terms of data persistence and storage space. However, in persistent memory programming, ensuring persistent consistency and high - efficiency performance among multiple memory addresses is a major challenge. Wang et al. proposed the PMwCAS algorithm to solve this problem, but this algorithm has redundant Compare - and - Swap (CAS) instructions and cache flush instructions, resulting in insufficient performance on multi - core CPUs. #### Main Contributions of the Paper 1. **Removing Redundant Instructions**: The authors improve the performance on multi - core CPUs by removing the useless CAS and cache flush instructions in PMwCAS operations. 2. **Excluding Dirty Flags**: The original algorithm uses dirty flags to manage data persistence. The new algorithm ensures persistent consistency by using PMwCAS descriptors as write - ahead logs, thus avoiding the use of dirty flags. 3. **Implementation and Evaluation**: The authors implement the new algorithm as a C++ library and verify its effectiveness through experiments. The experimental results show that the new algorithm is up to ten times faster than the original algorithm and also provide some suggestions on how to handle PMwCAS operations. #### Experimental Results The experimental results show that the new algorithm significantly improves throughput and latency in different contention environments. Especially in a high - contention environment, the throughput of the new algorithm is ten times that of the original algorithm, and the latency is only one - tenth of the original algorithm. In addition, the experiment also evaluates the impact of different parameters (such as the target number of words, the Zipf distribution skew parameter, and the memory block size per word) on performance. ### Summary In general, this paper solves the problem of low performance in persistent memory programming, especially in the multi - core CPU environment, by optimizing the PMwCAS algorithm. By removing redundant instructions and excluding dirty flags, the new algorithm not only improves performance but also simplifies the programming model, making it easier for developers to write efficient persistent memory programs. --- If you have more questions or need further assistance, please feel free to let me know!