ABSTRACT
In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, Sun Microsystems and AMD. To this end, we developed in C/C++ and Java a new micro-benchmark suite, called Synchrobench, hence helping the community evaluate new data structures and synchronization techniques. The main conclusion of this evaluation is threefold: (i) although compare-and-swap helps achieving the best performance on multicores, doing so correctly is hard; (ii) optimistic locking offers varying performance results while transactional memory offers more consistent results; and (iii) copy-on-write and read-copy-update suffer more from contention than any other technique but could be combined with others to derive efficient algorithms.
- JSE-7. http://docs.oracle.com/javase/7/docs/api/.Google Scholar
- D. Alistarh, J. Kopisky, J. Li, and N. Shavit. The SprayList: A scalable relaxed priority queue. Technical Report TR-2014-16, MSR, 2014.Google Scholar
- J. Antony, P. P. Janes, and A. P. Rendell. Exploring thread and memory placement on NUMA architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/Hypertransport. In HiPC, pages 338–352, 2006. Google ScholarDigital Library
- M. Arbel and H. Attiya. Concurrent updates with RCU: Search tree as an example. In PODC, pages 196–205, 2014. Google ScholarDigital Library
- D. F. Bacon, R. E. Strom, and A. Tarafdar. Guava: a dialect of Java without data races. In OOPSLA, pages 382–400, 2000. Google ScholarDigital Library
- S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Nonscalable locks are dangerous. In Proceedings of the Linux Symposium, Ottawa, Canada, 2012.Google Scholar
- N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A practical concurrent binary search tree. In PPoPP, pages 257–268, 2010. Google ScholarDigital Library
- C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC, pages 35–46, 2008.Google Scholar
- B. D. Carlstrom, A. McDonald, M. Carbin, C. Kozyrakis, and K. Olukotun. Transactional collection classes. In PPoPP, pages 56–67, 2007. Google ScholarDigital Library
- D. Cederman, B. Chatterjee, N. Nguyen, Y. Nikolakopoulos, M. Papatriantafilou, and P. Tsigas. A study of the behavior of synchronization methods in commonly used languages and systems. In IPDPS, pages 1309–1320, 2013. Google ScholarDigital Library
- A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Scalable address spaces using RCU balanced trees. In ASPLOS, pages 199–210, 2012. Google ScholarDigital Library
- C. Click. A lock-free hash table, 2007. http://www. azulsystems.com/events/javaone_2007/2007_ LockFreeHash.pdf.Google Scholar
- T. Crain, V. Gramoli, and M. Raynal. A contention-friendly methodology for search structures. Technical Report RR-1989, INRIA, 2012.Google Scholar
- T. Crain, V. Gramoli, and M. Raynal. A speculation-friendly binary search tree. In PPoPP, pages 161–170, 2012. Google ScholarDigital Library
- T. Crain, V. Gramoli, and M. Raynal. A contention-friendly binary search tree. In Euro-Par, pages 229–240, 2013. Google ScholarDigital Library
- T. Crain, V. Gramoli, and M. Raynal. No hot spot non-blocking skip list. In ICDCS, pages 196–205, 2013. Google ScholarDigital Library
- L. Dalessandro, M. F. Spear, and M. L. Scott. NOrec: streamlining STM by abolishing ownership records. In PPoPP, pages 67–78, 2010. Google ScholarDigital Library
- T. David, R. Guerraoui, and V. Trigonakis. Everything you always wanted to know about synchronization but were afraid to ask. In SOSP, pages 33–48, 2013. Google ScholarDigital Library
- T. David, R. Guerraoui, and V. Trigonakis. Asynchronized concurrency: the secret of scaling concurrent search structures. In ASPLOS, 2015.Google ScholarDigital Library
- To appear.Google Scholar
- D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In DISC, pages 194–208, 2006. Google ScholarDigital Library
- I. Dick, A. Fekete, and V. Gramoli. Logarithmic data structures for multicores. Technical Report 697, University of Sydney, 2014.Google Scholar
- D. Drachsler, M. Vechev, and E. Yahav. Practical concurrent binary search trees via logical ordering. In PPoPP, pages 343–356, 2014. Google ScholarDigital Library
- A. Dragojevi´c, P. Felber, V. Gramoli, and R. Guerraoui. Why STM can be more than a research toy. Commun. ACM, 54(4):70–77, 2011. Google ScholarDigital Library
- F. Ellen, P. Fatourou, E. Ruppert, and F. van Breugel. Non-blocking binary search trees. In PODC, pages 131–140, 2010. Google ScholarDigital Library
- P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In PPoPP, pages 237–246, 2008. Google ScholarDigital Library
- P. Felber, V. Gramoli, and R. Guerraoui. Elastic transactions. Technical Report LPD-REPORT-2009-002, EPFL, 2009.Google Scholar
- P. Felber, V. Gramoli, and R. Guerraoui. Elastic transactions. In DISC, pages 93–108, 2009. Google ScholarDigital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Quantifying the mismatch between emerging scale-out applications and modern processors. TOCS, 30(4):15:1–15:24, 2012. Google ScholarDigital Library
- M. Fomitchev and E. Ruppert. Lock-free linked lists and skip lists. In PODC, pages 50–59, 2004. Google ScholarDigital Library
- K. Fraser. Practical lock freedom. PhD thesis, Cambridge University, September 2003.Google Scholar
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cacheoblivious algorithms. In FOCS, page 285, 1999. Google ScholarDigital Library
- B. Goetz, T. Peierls, J. Bloch, J. Bowbeer, D. Lea, and D. Holmes. Java Concurrency in Practice. Addison-Wesley, 2005. Google ScholarDigital Library
- V. Gramoli and R. Guerraoui. Democratizing transactional programming. Commun. ACM, 57(1):86–93, 2014. Google ScholarDigital Library
- V. Gramoli and R. Guerraoui. Reusable concurrent data types. In ECOOP, pages 182–206, 2014.Google ScholarDigital Library
- V. Gramoli, R. Guerraoui, and V. Trigonakis. TM2C: A software transactional memory for many-cores. In EuroSys, pages 351–364, 2012. Google ScholarDigital Library
- R. Guerraoui, M. Kapalka, and J. Vitek. STMBench7: a benchmark for software transactional memory. In EuroSys, pages 315–324, 2007. Google ScholarDigital Library
- D. Harmanci, P. Felber, V. Gramoli, and C. Fetzer. TMunit: Testing software transactional memories. In 4th ACM SIGPLAN Workshop on Transactional Computing, 2009.Google Scholar
- D. Harmanci, V. Gramoli, P. Felber, and C. Fetzer. Extensible transactional memory testbed. J. of Parallel and Distributed Computing, 70(10):1053–1067, March 2010. Google ScholarDigital Library
- T. Harris. A pragmatic implementation of non-blocking linked-lists. In DISC, pages 300–314, 2001. Google ScholarDigital Library
- T. Harris, S. Marlow, S. Peyton-Jones, and M. Herlihy. Composable memory transactions. In PPoPP, pages 48–60, 2005. Google ScholarDigital Library
- S. Heller, M. Herlihy, V. Luchangco, M. Moir, W. N. Scherer III, and N. Shavit. A lazy concurrent list-based set algorithm. Parallel Processing Letters, 17(4):411–424, 2007.Google ScholarCross Ref
- M. Herlihy and E. Koskinen. Transactional boosting: A methodology for highly-concurrent transactional objects. In PPoPP, pages 207–216, 2008. Google ScholarDigital Library
- M. Herlihy, Y. Lev, V. Luchangco, and N. Shavit. A simple optimistic skiplist algorithm. In SIROCCO, pages 124–138, 2007. Google ScholarDigital Library
- M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-ended queues as an example. In ICDCS, 2003. Google ScholarDigital Library
- M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Software transactional memory for dynamic-sized data structures. In PODC, pages 92–101, 2003. Google ScholarDigital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, pages 289–300, 1993. Google ScholarDigital Library
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kauffman, 2008. Google ScholarDigital Library
- G. Korland, N. Shavit, and P. Felber. Deuce: Noninvasive software transactional memory. Transactions on HiPEAC, 5(2), 2010.Google Scholar
- D. Lea. JSR-166 specification request group. http://g.oswego. edu/dl/concurrency-interest.Google Scholar
- J. J. Levandoski and S. Sengupta. The BW-Tree: A latch-free B-tree for log-structured flash storage. IEEE Data Eng. Bull., 36(2):56–62, 2013.Google Scholar
- Y. Liu, K. Zhang, and M. Spear. Dynamic-sized nonblocking hash tables. In PODC, pages 242–251, 2014. Google ScholarDigital Library
- A. Marowka. TBBench: A micro-benchmark suite for Intel threading building blocks. JIPS, 8(2):331–346, 2012.Google ScholarCross Ref
- P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell, D. Sarma, and M. Soni. Read-copy update. In AUUG, 2001.Google Scholar
- M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In SPAA, pages 73–82, 2002. Google ScholarDigital Library
- M. M. Michael. The balancing act of choosing nonblocking features. Commun. ACM, 56(9):46–53, 2013. Google ScholarDigital Library
- M. M. Michael and M. L. Scott. Simple, fast, and practical nonblocking and blocking concurrent queue algorithms. In PODC, pages 267–275, 1996. Google ScholarDigital Library
- M. Mohamedin, B. Ravindran, and R. Palmieri. ByteSTM: Virtual machine-level Java software transactional memory. In COORDINATION, pages 166–180, 2013.Google Scholar
- A. Natarajan and N. Mittal. Fast concurrent lock-free binary search trees. In PPoPP, pages 317–328, 2014. Google ScholarDigital Library
- R. Odaira, J. G. Castaños, and T. Nakaike. Do C and Java programs scale differently on hardware transactional memory? In IISWC, pages 34–43, 2013.Google ScholarCross Ref
- W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM, 33, June 1990. Google ScholarDigital Library
- T. Riegel, P. Felber, and C. Fetzer. A lazy snapshot algorithm with eager validation. In DISC, pages 284–298, 2006. Google ScholarDigital Library
- H. Sundell and P. Tsigas. NOBLE: A non-blocking inter-process communication library. Technical report, Chalmers University of Technology, March 2002.Google Scholar
- H. Sundell and P. Tsigas. Scalable and lock-free concurrent dictionaries. In SAC, pages 1438–1445. ACM, 2004. Google ScholarDigital Library
- H. Sundell and P. Tsigas. Lock-free deques and doubly linked lists. J. Parallel Distrib. Comput., 68(7):1008–1020, 2008. Google ScholarDigital Library
- I. Umar, O. J. Anshus, and P. H. Ha. DeltaTree: A practical localityaware concurrent search tree. Technical Report 2013-74, University of Tromso, Norway, Oct. 2013.Google Scholar
- B. Wicht. Binary trees implementations comparison for multicore programming. Technical report, Switzerland HES-SO University of applied science, 2012.Google Scholar
- R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: why the going gets tough. In SPAA, pages 265–274, 2008. Google ScholarDigital Library
- F. Zyulkyarov, A. Cristal, S. Cvijic, E. Ayguade, M. Valero, O. Unsal, and T. Harris. Wormbench: A configurable workload for evaluating transactional memory systems. In MEDEA, pages 61–68, 2008. Google ScholarDigital Library
- F. Zyulkyarov, S. Stipic, T. Harris, O. S. Unsal, A. Cristal, I. Hur, and M. Valero. Profiling and optimizing transactional memory applications. International Journal of Parallel Programming, 40(1):25–56, 2012.Google ScholarCross Ref
Index Terms
- More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms
Recommendations
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms
PPoPP '15In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, ...
Looking for efficient implementations of concurrent objects
PaCT'11: Proceedings of the 11th international conference on Parallel computing technologiesAs introduced by Taubenfeld, a contention-sensitive implementation of a concurrent object is an implementation such that the overhead introduced by locking is eliminated in the common cases, i.e., when there is no contention or when the operations ...
Transactional Acceleration of Concurrent Data Structures
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and ArchitecturesConcurrent data structures are a fundamental building block for scalable multi-threaded programs. While Transactional Memory (TM) was originally conceived as a mechanism for simplifying the creation of concurrent data structures, modern hardware TM ...
Comments