Formalizing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs

Chen Chen,Wenguang Chen,Vugranam Sreedhar,Rajkishore Barik,Vivek Sarkar,Guang Gao,CAPSL Technical Memo
2010-01-01
Abstract:It has been observed in previous work that it is desirable to avoid causal violations in any execution or transformation of a parallel program. In this paper, we formalize the notion of causality in memory consistency models and code transformations. For memory models, we introduce a framework of causality graph that can be used to analyze if a particular memory model violates causality. We show that a popular memory model as the Java memory model (JMM) [16], can lead to program executions that exhibit causality violations with respect to our definition of causality. The same analysis appears to also apply to a recent proposal of C++ specification [7] where the underline memory model may also lead to similar problems. For code transformations, we identify transformations that are causality-preserving and those that are potentially causality-violating. We found that 10 of the 13 code transformation examples that were identified as causality-preserving with respect to the Java Memory Model fail our causality graph test and thus represent causality violations in our framework. Likewise, we also present examples to illustrate how the recently proposed C++ Memory Model can lead to potential causality violations. Using our formalization, we establish causality as a desideratum for memory models and code transformations of parallel programs and define a Causal Memory Model (CMM) as the weakest memory model that preserves causality. We identify specific code transformations that are guaranteed to be causality-preserving. Finally, we present preliminary experimental results for a load elimination optimization to motivate the performance benefit of using the CMM model relative to the Sequential Consistency (SC) model. For the benchmark program studied, the number of getfield operations performed was reduced by 37.9% by using the CMM model instead of the SC model, and the execution time on a 16-core processor was reduced by 46.2%.
What problem does this paper attempt to address?