ABSTRACT
Software transactional memory (STM) is a promising programming paradigm for shared memory multithreaded programs as an alternative to traditional lock based synchronization. However adoption of STM in mainstream software has been quite low due to its considerable overheads and its poor cache/memory performance. In this paper, we perform a detailed study of the cache behavior of STM applications and quantify the impact of different STM factors on the cache misses experienced by the applications. Based on our analysis, we propose a compiler driven Lock-Data Colocation (LDC), targeted at reducing the cache overheads on STM. We show that LDC is effective in improving the cache behavior of STM applications by reducing the dcache miss latency and improving execution time performance.
- }}C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC '08, pages 35--46, Sep 2008.Google Scholar
- }}Larus J., Rajwar R. Transactional Memory. Morgan and Claypool Publishers.Google Scholar
- }}D. Dice, O. Shalev N. Shavit. Transactional locking II. DISC 2004 Google ScholarDigital Library
Index Terms
- Analyzing cache performance bottlenecks of STM applications and addressing them with compiler's help
Recommendations
Maintaining Cache Coherence through Compiler-Directed Data Prefetching
In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. TheCache Coherence With Data Prefetching(CCDP) scheme uses ...
Compiler Optimization to Reduce Cache Power with Victim Cache
UIC-ATC '12: Proceedings of the 2012 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted ComputingVictim cache can buffer blocks discarded from the cache on a miss before going to the next lower-level memory to improve performance. Compared with the previous work, rather than only improve performance, we design a modified victim cache to reduce ...
Increasing hardware data prefetching performance using the second-level cache
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Comments