Abstract
DrTM is a fast in-memory transaction processing system that exploits advanced hardware features such as remote direct memory access (RDMA) and hardware transactional memory (HTM). To achieve high efficiency, it mostly offloads concurrency control such as tracking read/write accesses and conflict detection into HTM in a local machine and leverages the strong consistency between RDMA and HTM to ensure serializability among concurrent transactions across machines. To mitigate the high probability of HTM aborts for large transactions, we design and implement an optimized transaction chopping algorithm to decompose a set of large transactions into smaller pieces such that HTM is only required to protect each piece. We further build an efficient hash table for DrTM by leveraging HTM and RDMA to simplify the design and notably improve the performance. We describe how DrTM supports common database features like read-only transactions and logging for durability. Evaluation using typical OLTP workloads including TPC-C and SmallBank shows that DrTM has better single-node efficiency and scales well on a six-node cluster; it achieves greater than 1.51, 34 and 5.24, 138 million transactions per second for TPC-C and SmallBank on a single node and the cluster, respectively. Such numbers outperform a state-of-the-art single-node system (i.e., Silo) and a distributed transaction system (i.e., Calvin) by at least 1.9X and 29.6X for TPC-C.
- Marcos K. Aguilera, Joshua B. Leners, Ramakrishna Kotla, and Michael Walfish. 2015. Yesquel: Scalable SQL storage for Web applications. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY. Google ScholarDigital Library
- Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A new paradigm for building scalable distributed systems. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). ACM, New York, NY, 159--174. Google ScholarDigital Library
- Mohammad Alomari, Michael Cahill, Alan Fekete, and Uwe Röhm. 2008. The cost of serializability on platforms that use snapshot isolation. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE, Los Alamitos, CA, 576--585. Google ScholarDigital Library
- J. Baker, C. Bond, J. C. Corbett, J. J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR’11). 223--234.Google Scholar
- D. S. Batoory, J. R. Barnett, J. F. Garza, K. P. Smith, K. Tsukuda, B. C. Twichell, and T. E. Wise. 1988. GENESIS: An extensible database management system. IEEE Transactions on Software Engineering 14, 11, 1711--1730. Google ScholarDigital Library
- Arthur J. Bernstein, David S. Gerstl, and Philip M. Lewis. 1999. Concurrency control for step-decomposed transactions. Information Systems 24, 9, 673--698. http://dl.acm.org/citation.cfm?id=337919.337922 Google ScholarDigital Library
- Philip A. Bernstein and Nathan Goodman. 1981. Concurrency control in distributed database systems. ACM Computing Surveys 13, 2, 185--221. Google ScholarDigital Library
- Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Vol. 370. Addison-Wesley, New York, NY. Google ScholarDigital Library
- Philip A. Bernstein and David W. Shipman. 1980. The correctness of concurrency control mechanisms in a system for distributed databases (SDD-1). ACM Transactions on Database Systems 5, 1, 52--68. Google ScholarDigital Library
- Colin Blundell, E. Christopher Lewis, and Milo M. K. Martin. 2006. Subtleties of transactional memory atomicity semantics. IEEE Computer Architecture Letters 5, 2, 17. Google ScholarDigital Library
- Robert L. Bocchino, Vikram S. Adve, and Bradford L. Chamberlain. 2008. Software transactional memory for large scale clusters. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). ACM, New York, NY, 247--258. Google ScholarDigital Library
- Nuno Carvalho, Paolo Romano, and Luís Rodrigues. 2010. Asynchronous lease-based replication of software transactional memory. In Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware (Middleware’10). 376--396. http://dl.acm.org/citation.cfm?id=2023718.2023744 Google ScholarDigital Library
- Miguel Castro and Barbara Liskov. 1999. Practical byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI’99). 173--186. http://dl.acm.org/citation.cfm?id=296806.296824 Google ScholarDigital Library
- Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. 2007. Paxos made live: An engineering perspective. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). ACM, New York, NY, 398--407. Google ScholarDigital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’05). ACM, New York, NY, 519--538. Google ScholarDigital Library
- Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey, François Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanti, Yiyi Yao, and Daniel Chavarría-Miranda. 2005. An evaluation of global address space languages: Co-array Fortran and unified parallel C. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’05). ACM, New York, NY, 36--47. Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 143--154. Google ScholarDigital Library
- J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, et al. 2012. Spanner: Google’s globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 251--264. http://dl.acm.org/citation.cfm?id=2387880.2387905 Google ScholarDigital Library
- James Cowling and Barbara Liskov. 2012. Granola: Low-overhead distributed transaction coordination. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). Google ScholarDigital Library
- Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server’s memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 1243--1254. Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). 401--414. http://dl.acm.org/citation.cfm?id=2616448.2616486 Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: Distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY, 54--70. Google ScholarDigital Library
- Hector Garcia-Molina. 1983. Using semantic knowledge for transaction processing in a distributed database. ACM Transactions on Database Systems 8, 2, 186--213. Google ScholarDigital Library
- C. Gray and D. Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). ACM, New York, NY, 202--210. Google ScholarDigital Library
- Jim Gray and Andreas Reuter. 1993. Transaction Processing: Concepts and Techniques. Morgan Kaufmann. Google ScholarDigital Library
- Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA’93). ACM, New York, NY, 289--300. Google ScholarDigital Library
- Maurice Herlihy, Nir Shavit, and Moran Tzafrir. 2008. Hopscotch hashing. In Proceedings of the 22nd International Symposium on Distributed Computing (DISC’08). 350--364. Google ScholarDigital Library
- Maurice Herlihy and Ye Sun. 2005. Distributed transactional memory for metric-space networks. In Proceedings of the 19th International Conference on Distributed Computing (DISC’05). 324--338. Google ScholarDigital Library
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (USENIX ATC’10). 11. http://dl.acm.org/citation.cfm?id=1855840.1855851 Google ScholarDigital Library
- IEEE. 2015. IEEE 1588 Precision Time Protocol (PTP). Retrieved June 5, 2017, from https://www.eecis.udel.edu/∼mills/ptp.html.Google Scholar
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM’14). ACM, New York, NY, 295--306. Google ScholarDigital Library
- Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. 2007. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). ACM, New York, NY, 45--58. Google ScholarDigital Library
- H. T. Kung and J. T. Robinson. 1981. On optimistic methods for concurrency control. ACM Transactions on Database Systems 6, 2, 213--226. Google ScholarDigital Library
- Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google ScholarDigital Library
- Viktor Leis, Alfons Kemper, and Tobias Neumann. 2014. Exploiting hardware transactional memory in main-memory databases. In Proceedings of the IEEE 30th International Conference on Data Engineering (ICDE’14). IEEE, New York, NY, 580--591.Google ScholarCross Ref
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). 429--444. http://dl.acm.org/citation.cfm?id=2616448.2616488 Google ScholarDigital Library
- Bruce Lindsay, John McPherson, and Hamid Pirahesh. 1987. A data management extension architecture. In Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data (SIGMOD’87). ACM, New York, NY, 220--226. Google ScholarDigital Library
- Ran Liu and Haibo Chen. 2012. SSMalloc: A low-latency, locality-conscious memory allocator with stable performance scalability. In Proceedings of the 3rd ACM SIGOPS Asia-Pacific Conference on Systems (APSys’12). 15. http://dl.acm.org/citation.cfm?id=2387841.2387856 Google ScholarDigital Library
- Mike Mammarella, Shant Hovsepian, and Eddie Kohler. 2009. Modular data storage with anvil. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). ACM, New York, NY, 147--160. Google ScholarDigital Library
- Kaloian Manassiev, Madalin Mihailescu, and Cristiana Amza. 2006. Exploiting distributed version concurrency in a transactional memory cluster. In Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’06). ACM, New York, NY, 198--208. Google ScholarDigital Library
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, New York, NY, 183--196. Google ScholarDigital Library
- Mellanox Technologies. 2015. RDMA Aware Networks Programming User Manual. Retrieved June 5, 2017, from http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf.Google Scholar
- Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC’13). 103--114. http://dl.acm.org/citation.cfm?id=2535461.2535475 Google ScholarDigital Library
- Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2014. Paxos quorum leases: Fast reads without sacrificing writes. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’14). ACM, New York, NY, Article No. 22. Google ScholarDigital Library
- Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting more concurrency from distributed transactions. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 479--494. http://dl.acm.org/citation.cfm?id=2685048.2685086 Google ScholarDigital Library
- Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A timely dataflow system. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 439--455. Google ScholarDigital Library
- Dushyanth Narayanan and Orion Hodson. 2012. Whole-system persistence. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). ACM, New York, NY, 401--410. Google ScholarDigital Library
- Neha Narula, Cody Cutler, Eddie Kohler, and Robert Morris. 2014. Phase reconciliation for contended in-memory transactions. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 511--524. http://dl.acm.org/citation.cfm?id=2685048.2685088 Google ScholarDigital Library
- Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2, 122--144. Google ScholarDigital Library
- Hao Qian, Zhaoguo Wang, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. Exploiting Hardware Transactional Memory for Efficient In-Memory Transaction Processing. Technical Report. Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University.Google Scholar
- Dennis Shasha, Francois Llirbat, Eric Simon, and Patrick Valduriez. 1995. Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems 20, 3, 325--363. Google ScholarDigital Library
- Nir Shavit and Dan Touitou. 1995. Software transactional memory. In Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing (PODC’95). ACM, New York, NY, 204--213. Google ScholarDigital Library
- The H-Store Team. 2013. Articles Benchmark Schema. Retrieved June 5, 2017, from http://hstore.cs.brown.edu/documentation/deployment/benchmarks/articles.Google Scholar
- The H-Store Team. 2015a. The SEATS Airline Ticketing Systems Benchmark. Retrieved June 5, 2017, from http://hstore.cs.brown.edu/documentation/deployment/benchmarks/seats/.Google Scholar
- The H-Store Team. 2015b. SmallBank Benchmark. Retrieved June 5, 2017, from http://hstore.cs.brown.edu/documentation/deployment/benchmarks/smallbank/.Google Scholar
- The Storage Networking Industry Association (SNIA). 2015. NVDIMM Special Interest Group. Retrieved June 5, 2017, from http://www.snia.org/forums/sssi/NVDIMM.Google Scholar
- The Transaction Processing Council. 2001. TPC-C Benchmark V5. Retrieved June 5, 2017, from http://www.tpc.org/tpcc/.Google Scholar
- Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). ACM, New York, NY, 1--12. Google ScholarDigital Library
- Khai Q. Tran, Spyros Blanas, and Jeffrey F. Naughton. 2010. On transactional memory, spinlocks, and database transactions. In Proceedings of the International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS’10). 43--50. http://www.vldb.org/archives/workshop/2010/proceedings/files/vldb_2010_workshop/ADMS_2010/adms10-tran.pdf.Google Scholar
- R Kent Treiber. 1986. Systems Programming: Coping with Parallelism. Number RJ 5118. IBM Almaden Research Center. http://domino.research.ibm.com/library/cyberdig.nsf/papers/58319A2ED2B107 8985257003004617EF/$File/rj5118.pdf.Google Scholar
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 18--32. Google ScholarDigital Library
- Yandong Wang, Xiaoqiao Meng, Li Zhang, and Jian Tan. 2014. C-Hint: An effective and reliable cache management for RDMA-accelerated key-value stores. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’14). ACM, New York, NY, Article No. 23. Google ScholarDigital Library
- Zhaoguo Wang, Shuai Mu, Yang Cui, Han Yi, Haibo Chen, and Jinyang Li. 2016. Scaling multicore databases via constrained parallel execution. In Proceedings of the 2016 International Conference on Management of Data. 1643--1658. Google ScholarDigital Library
- Zhaoguo Wang, Hao Qian, Haibo Chen, and Jinyang Li. 2013. Opportunities and pitfalls of multi-core scaling using hardware transaction memory. In Proceedings of the 4th Asia-Pacific Workshop on Systems (APSys’13). ACM, New York, NY, Article No. 3. Google ScholarDigital Library
- Zhaoguo Wang, Hao Qian, Jinyang Li, and Haibo Chen. 2014. Using restricted transactional memory to build a scalable in-memory database. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). ACM, New York, NY, Article No. 26. Google ScholarDigital Library
- Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY, 87--104. Google ScholarDigital Library
- Chao Xie, Chunzhi Su, Manos Kapritsos, Yang Wang, Navid Yaghmazadeh, Lorenzo Alvisi, and Prince Mahajan. 2014. Salt: Combining ACID and BASE in a distributed database. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 495--509. http://dl.acm.org/citation.cfm?id=2685048.2685087 Google ScholarDigital Library
- Chao Xie, Chunzhi Su, Cody Littley, Lorenzo Alvisi, Manos Kapritsos, and Yang Wang. 2015. High-performance ACID via modular concurrency control. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, New York, NY, 279--294. Google ScholarDigital Library
- Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. 2015. Building consistent transactions with inconsistent replication. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY, 263--278. Google ScholarDigital Library
- Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction chains: Achieving serializability with low latency in geo-distributed storage systems. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 276--291. Google ScholarDigital Library
- Wenting Zheng, Stephen Tu, Eddie Kohler, and Barbara Liskov. 2014. Fast databases with fast durability and recovery through multicore parallelism. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 465--477. http://dl.acm.org/citation.cfm?id=2685048.2685085 Google ScholarDigital Library
Index Terms
- Fast In-Memory Transaction Processing Using RDMA and HTM
Recommendations
Fast and general distributed transactions using RDMA and HTM
EuroSys '16: Proceedings of the Eleventh European Conference on Computer SystemsRecent transaction processing systems attempt to leverage advanced hardware features like RDMA and HTM to significantly boost performance, which, however, pose several limitations like requiring priori knowledge of read/write sets of transactions and ...
Hybrid STM/HTM for nested transactions on OpenJDK
OOPSLA '16Transactional memory (TM) has long been advocated as a promising pathway to more automated concurrency control for scaling concurrent programs running on parallel hardware. Software TM (STM) has the benefit of being able to run general transactional ...
Hybrid STM/HTM for nested transactions on OpenJDK
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsTransactional memory (TM) has long been advocated as a promising pathway to more automated concurrency control for scaling concurrent programs running on parallel hardware. Software TM (STM) has the benefit of being able to run general transactional ...
Comments