A Study of the Effect of Prefetching in Shared-Memory Resource Contention

Tanima Dey,Wei Wang,Jack Davidson,Mary Lou Soffa
2011-01-01
Abstract:Managing contention for shared resources in chip multiprocessors has become very challenging as the number of cores and execution contexts scale up. Contention for the memory hierarchy resources, especially the shared caches, can severely degrade an application’s performance and system throughput [2]. One of the important resources related to caching is hardware prefetcher whose effect on the shared-memory resource contention has not been fully explored. Hardware prefetchers can have potential impact on an application’s performance depending on its data access pattern. If the application has sequential or strided data access pattern, the prefetchers can bring data in the caches in advance, reducing cache misses and improving performance. On the other hand, if the application’s data access pattern does not follow any pattern, the prefetched data can pollute the cache, resulting in more cache misses and worse performance. In this research, we study and measure the prefetching usefulness on multi-threaded applications’ performances while the applications contend for different shared-memory resources. Such a study provides insights in understanding applications’ behaviors and helps to further improve an application’s performance by dynamically adjusting and utilizing prefetching along with the contention mitigating techniques. We use the multi-threaded PARSEC benchmarks to measure the prefetching effect on several resources in the memory hierarchy, including L1-cache, L2-cache and Front Side Bus (FSB). To determine the effect of prefetching on shared-resource contention in the memory hierarchy, we need a methodology for comparing the application’s performance when the prefetcher is enabled and disabled, at the same time when there is contention for the targeted resource. In general, to measure contention for a particular shared resource, applications are run in two resource configurations. In the baseline configuration, application threads are mapped onto cores such that the threads do not share the targeted resource and run using dedicated resources. In the contention configuration, the application threads are mapped onto the cores such that the threads execute sharing the targeted resource. Because a multi-threaded application has multiple threads, there is possibility of contention when these threads share a resource.
What problem does this paper attempt to address?