ABSTRACT
We describe a counter-intuitive performance phenomena relevant to concurrency research. On a modern multicore system with a shared last-level cache, a set of concurrently running identical threads that loop -- each accessing the same quantity of distinct thread-private data -- can suffer significant relative progress imbalance. If one thread, or a small subset of the threads, manages to transiently enjoy higher cache residency than the other threads, that thread will tend to iterate faster and keep more of its data resident, thus increasing the odds that it will continue to run faster. This emergent behavior tends to be stable over surprisingly long periods.
- Y. Afek, D. Dice, and A. Morrison. Cache Index-aware Memory Allocation. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 55--64, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- B. Brett, P. Kumar, M. Kim, and H. Kim. CHiP: A Profiler to Measure the Effect of Cache Contention on Scalability. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW '13, pages 1565--1574, Washington, DC, USA, 2013. IEEE Computer Society. Google ScholarDigital Library
- A. K. Katti and V. Ramachandran. Competitive Cache Replacement Strategies for Shared Cache Environments. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS '12, pages 215--226, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarDigital Library
- G. Marsaglia. Xorshift RNGs. Journal of Statistical Software, 8(14):1--6, 7 2003.Google ScholarCross Ref
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst., 9:21--65, February 1991. Google ScholarDigital Library
- Oracle Corporation. Oracle's SPARC T4--1, SPARC T4--2, SPARC T4--4, and SPARC T4--1B Server Architecture, 2012.Google Scholar
Index Terms
- Persistent unfairness arising from cache residency imbalance
Recommendations
Malthusian Locks
EuroSys '17: Proceedings of the Twelfth European Conference on Computer SystemsApplications running in modern multithreaded environments are sometimes overthreaded. The excess threads do not improve performance, and in fact may act to degrade performance via scalability collapse, which can manifest even when there are fewer ready ...
Lowering Conflicts of High Contention Software Transactional Memory
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 03Two concurrent transactions are said to conflict based on linearizability semantics if they access the same shared data and at least one of them modifies that data. In many applications enforcing the strict linearizability semantics over the entire read-...
Lock Cohorting: A General Technique for Designing NUMA Locks
Special Issue on PPOPP 2012Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock ...
Comments