Asynchronous I/O -- With Great Power Comes Great Responsibility

Constantin Pestka,Marcus Paradies,Matthias Pohl
2024-11-25
Abstract:The performance of storage hardware has improved vastly recently, leaving the traditional I/O stack incapable of exploiting these gains due to increasingly large relative overheads. Newer asynchronous I/O APIs, such as io_uring, have significantly improved performance by reducing such overheads, but exhibit limited adoption in practice. In this paper, we discuss the complexities that the usage of these contemporary I/O APIs introduces to applications, which we believe are mostly responsible for their low adoption rate. Finally, we share implications and trade offs made by architectures that may be used to integrate asynchronous I/O into DB applications.
Databases,Operating Systems
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that after the performance of modern storage hardware has been greatly improved, the traditional I/O stack cannot fully utilize these hardware advantages due to relatively large overheads. Specifically: 1. **Limitations of the traditional I/O stack**: - The traditional blocking I/O API design has been unable to keep up with the performance improvement of modern storage hardware (such as SSDs), resulting in frequent context switches and other overheads, making it difficult for the CPU to efficiently utilize these high - performance hardware. - Although the new asynchronous I/O API (such as io_uring) has significantly improved performance, its adoption rate is still low. 2. **Complexity and challenges of the asynchronous I/O API**: - The asynchronous I/O API introduces complex task scheduling and parallel processing requirements, which bring an additional burden to application development. - Applications need to manage the submission and completion of I/O requests in user space, which increases the programming complexity. - Achieving efficient parallel processing and user - space task scheduling is a key challenge in applying these APIs. 3. **Insufficiency of existing libraries and support**: - Many existing I/O libraries (such as libuv, seastar and tokio) have limited support for the new asynchronous I/O API or are only in the experimental stage. - Widely - used libraries (such as libc and libc++) do not support these new APIs at all. - Most commercial database management systems (DBMS) still rely on the blocking I/O API. ### Specific problem summary - **How to improve I/O performance to fully utilize the advantages of modern storage hardware?** - **Why is the adoption rate of the new asynchronous I/O API low?** - **How to deal with the complexity brought by the asynchronous I/O API, especially in terms of parallel processing and task scheduling?** - **How to improve existing I/O libraries to better support the new asynchronous I/O API?** ### Solution discussion By analyzing the advantages and challenges of the asynchronous I/O API, the paper proposes some architectural patterns and solutions, aiming to help developers better understand and apply these APIs. Specifically: - **Task partitioning strategies**: Different task partitioning methods such as full partitioning, callback partitioning and coroutines are discussed. - **Execution architectures**: Different execution architectures such as direct access, shared - nothing communication and static I/O thread pools are explored to deal with concurrent processing and resource allocation problems. Through these discussions, the paper hopes to provide guidance for researchers and developers on how to effectively integrate and optimize the asynchronous I/O API, thereby improving the performance of I/O - intensive applications.