Esdmt: Efficient and Scalable Deterministic Multithreading Through Memory Isolation
Jie Sun,Xiaofei Liao,Long Zheng,Hai Jin,Yu Zhang
DOI: https://doi.org/10.1109/padsw.2014.7097794
2014-01-01
Abstract:Deterministic multithreading (DMT) system is well-known to eliminate the harmful program behaviors caused by nondeterminism, i.e., always proceeding the program execution into the same thread schedule for the same given input. To achieve this goal, two kinds of schedules are enforced by existing DMT systems. 1) A mem-based schedule ensures the determinism with the total order of the shared memory accesses, and 2) A sync-based schedule makes it by only enforcing the total order of the synchronization operations. Mem-schedule achieves full determinism but suffers from prohibitive overhead; while sync-schedule mitigates this overhead but cannot ensure the determinism for the race schedules, i.e., part determinism. Much recent research is devoted to the hybrid schedule combining the determinism of mem-schedule and efficiency of sync-schedule. However, they suffer from the practicability and scalability problems due to the defects of their technical characteristics, such as trace collection in advance and huge schedule memoization. To address the above problem, this paper proposes esDMT, an efficient and scalable DMT system using a new technique of memory isolation. It can improve the efficiency by proceeding the execution of each thread in parallel within its private virtual memory, and defers the determinism guarantee by updating private memory into shared memory in a deterministic order according to deterministic lock algorithm, thus further reducing the overhead of inter-thread waiting. In contrast to the previous hybrid work avoiding the nondeterminism of race schedules offline based on the enormous historical records, our key insight is to eliminate the nondeterminism of race schedules online at runtime. Our experimental results on PARSEC benchmarks show that esDMT eliminates the nondeterminism successfully, almost gains the same performance as the sync-schedule (with <;18% slowdown compared with pthread library at most), and manifests good scalability on an 8-core machine.