Diagnosing applications' I/O behavior through system call observability

Tânia Esteves,Ricardo Macedo,Rui Oliveira,João Paulo
DOI: https://doi.org/10.48550/arXiv.2304.08569
2023-04-18
Abstract:We present DIO, a generic tool for observing inefficient and erroneous I/O interactions between applications and in-kernel storage systems that lead to performance, dependability, and correctness issues. DIO facilitates the analysis and enables near real-time visualization of complex I/O patterns for data-intensive applications generating millions of storage requests. This is achieved by non-intrusively intercepting system calls, enriching collected data with relevant context, and providing timely analysis and visualization for traced events. We demonstrate its usefulness by analyzing two production-level applications. Results show that DIO enables diagnosing resource contention in multi-threaded I/O that leads to high tail latency and erroneous file accesses that cause data loss.
Distributed, Parallel, and Cluster Computing,Operating Systems,Performance
What problem does this paper attempt to address?