ABSTRACT
With the ever increasing DRAM capacity in commodity computers, applications tend to store large amount of data in main memory for fast access. Accordingly, efficient traversal of index structures to locate requested data becomes crucial to their performance. The index data structures grow so large that only a fraction of them can be cached in the CPU cache. The CPU cache can leverage access locality to keep the most frequently used part of an index in it for fast access. However, the traversal on the index to a target data during a search for a data item can result in significant false temporal and spatial localities, which make CPU cache space substantially underutilized. In this paper we show that even for highly skewed accesses the index traversal incurs excessive cache misses leading to suboptimal data access performance. To address the issue, we introduce Search Lookaside Buffer (SLB) to selectively cache only the search results, instead of the index itself. SLB can be easily integrated with any index data structure to increase utilization of the limited CPU cache resource and improve throughput of search requests on a large data set. We integrate SLB with various index data structures and applications. Experiments show that SLB can improve throughput of the index data structures by up to an order of magnitude. Experiments with real-world key-value traces also show up to 73% throughput improvement on a hash table.
- Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. 2015. Operating Systems: Three Easy Pieces (0.91 ed.). Arpaci-Dusseau Books.Google ScholarDigital Library
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12). ACM, New York, NY, USA, 53--64. Google ScholarDigital Library
- B+-tree 2017. B+-tree. https://en.wikipedia.org/wiki/B%2B_tree. (2017).Google Scholar
- Masanori Bando, Yi-Li Lin, and H. Jonathan Chao. 2012. FlashTrie: Beyond 100-Gb/s IP Route Lookup Using Hash-based Prefix-compressed Trie. IEEE/ACM Trans. Netw. 20, 4 (Aug. 2012), 1262--1275. Google ScholarDigital Library
- Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248. Google ScholarDigital Library
- R. Bayer and E. McCreight. 1970. Organization and Maintenance of Large Ordered Indices. In Proceedings of the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '70). ACM, New York, NY, USA, 107--141. Google ScholarDigital Library
- Jeff Bonwick, Matt Ahrens, Val Henson, Mark Maybee, and Mark Shellenbaum. 2003. The zettabyte file system. In Proc. of the 2nd Usenix Conference on File and Storage Technologies.Google Scholar
- Chee-Yong Chan and Yannis E. Ioannidis. 1998. Bitmap Index Design and Evaluation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD '98). ACM, New York, NY, USA, 355--366. Google ScholarDigital Library
- Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, and Todd C. Mowry. 2007. Improving Hash Join Performance Through Prefetching. ACM Trans. Database Syst. 32, 3, Article 17 (Aug. 2007). Google ScholarDigital Library
- Eric S. Chung, John D. Davis, and Jaewon Lee. 2013. LINQits: Big Data on Little Clients. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 261--272. Google ScholarDigital Library
- Fernando J Corbato. 1968. A paging experiment with the multics system. Technical Report. DTIC Document.Google Scholar
- Justin DeBrabant, Andrew Pavlo, Stephen Tu, Michael Stonebraker, and Stan Zdonik. 2013. Anti-caching: A New Approach to Database Management System Architecture. Proc. VLDB Endow. 6, 14 (Sept. 2013), 1942--1953. Google ScholarDigital Library
- Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. 2009. RouteBricks: Exploiting Parallelism to Scale Software Routers. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 15--28. Google ScholarDigital Library
- Ahmed Eldawy, Justin Levandoski, and Per-Åke Larson. 2014. Trekking Through Siberia: Managing Cold Data in a Memory-optimized Database. Proc. VLDB Endow. 7, 11 (July 2014), 931--942. Google ScholarDigital Library
- Florian Funke, Alfons Kemper, and Thomas Neumann. 2012. Compacting Transactional Data in Hybrid OLTP&OLAP Databases. Proc. VLDB Endow. 5, 11 (July 2012), 1424--1435. Google ScholarDigital Library
- Brian Gold, Anastassia Ailamaki, Larry Huston, and Babak Falsafi. 2005. Accelerating Database Operators Using a Network Processor. In Proceedings of the 1st International Workshop on Data Management on New Hardware (DaMoN '05). ACM, New York, NY, USA, Article 1. Google ScholarDigital Library
- Google Sparse Hash 2017. Google Sparse Hash. http://github.com/sparsehash/sparsehash/. (2017).Google Scholar
- Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagiv. 2012. Concurrent Data Representation Synthesis. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 417--428. Google ScholarDigital Library
- Timothy Hayes, Oscar Palomar, Osman Unsal, Adrian Cristal, and Mateo Valero. 2012. Vector Extensions for Decision Support DBMS Acceleration. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 166--176. Google ScholarDigital Library
- Maurice Herlihy and Nir Shavit. 2011. The art of multiprocessor programming. Morgan Kaufmann.Google Scholar
- Maurice Herlihy, Nir Shavit, and Moran Tzafrir. 2008. Hopscotch hashing. In International Symposium on Distributed Computing. Springer, 350--364. Google ScholarDigital Library
- hugepages 2017. HugeTlbPage. https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt. (2017).Google Scholar
- Infiniband HDR 2017. 200Gb/s HDR InfiniBand Solutions. https://goo.gl/z6Z1xc. (2017).Google Scholar
- InnoDB B-tree 2017. Physical Structure of an InnoDB Index. https://goo.gl/mHnpFb. (2017).Google Scholar
- Intel Xeon E5 2017. Intel(R) Xeon(R) Processor E5-2683 v4. https://goo.gl/4Ls9xr. (2017).Google Scholar
- Song Jiang, Xiaoning Ding, Feng Chen, Enhua Tan, and Xiaodong Zhang. 2005. DULO: An Effective Buffer Cache Management Scheme to Exploit Both Temporal and Spatial Locality. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies - Volume 4 (FAST'05). USENIX Association, Berkeley, CA, USA, 8--8. http://dl.acm.org/citation.cfm?id=1251028.1251036Google Scholar
- Song Jiang and Xiaodong Zhang. 2004. ULC: A File Block Placement and Replacement Protocol to Effectively Exploit Hierarchical Locality in Multi-Level Buffer Caches. In Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04) (ICDCS '04). IEEE Computer Society, Washington, DC, USA, 168--177. http://dl.acm.org/citation.cfm?id=977400.977997 Google ScholarCross Ref
- Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A High-performance, Distributed Main Memory Transaction Processing System. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1496--1499. Google ScholarDigital Library
- Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389. Google ScholarDigital Library
- Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the Walkers: Accelerating Index Traversals for In-memory Databases. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 468--479. Google ScholarDigital Library
- Per-Åke Larson, Cipri Clinciu, Eric N. Hanson, Artem Oks, Susan L. Price, Srikumar Rangarajan, Aleksandras Surna, and Qingqing Zhou. 2011. SQL Server Column Store Indexes. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, New York, NY, USA, 1177--1184. Google ScholarDigital Library
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14). USENIX Association, Berkeley, CA, USA, 429--444. http://dl.acm.org/citation.cfm?id=2616448.2616488Google ScholarDigital Library
- lmdb 2017. Symas Lightning Memory-mapped Database. http://www.lmdb.tech/doc/. (2017).Google Scholar
- Daniel Lustig, Abhishek Bhattacharjee, and Margaret Martonosi. 2013. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs. ACM Trans. Archit. Code Optim. 10, 1, Article 2 (April 2013), 38 pages. Google ScholarDigital Library
- Stefan Manegold, Peter Boncz, and Martin Kersten. 2002. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. on Knowl. and Data Eng. 14, 4 (July 2002), 709--730. Google ScholarDigital Library
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-value Storage. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 183--196. Google ScholarDigital Library
- Rich Martin. 1996. A Vectorized Hash-Join. In iRAM technical report. University of California at Berkeley.Google Scholar
- Memcached 2017. Memcached - a distributed memory object caching system. https://memcached.org/. (2017).Google Scholar
- MemSQL 2017. MemSQL. http://www.memsql.com/. (2017).Google Scholar
- MICA source code 2017. MICA. https://github.com/efficient/mica/. (2017).Google Scholar
- MongoDB 2017. MongoDB for GIANT Ideas. https://mongodb.com/. (2017).Google Scholar
- Rasmus Pagh and Flemming Friche Rodler. 2001. Cuckoo hashing. In European Symposium on Algorithms. Springer, 121--133. Google ScholarCross Ref
- Redis 2017. Redis. http://redis.io/. (2017).Google Scholar
- Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-Tree Filesystem. Trans. Storage 9, 3, Article 9 (Aug. 2013), 32 pages. Google ScholarDigital Library
- SQLite 2017. In-Memory Databases - SQLite. https://sqlite.org/inmemorydb.html. (2017).Google Scholar
- SQLite B-tree 2017. Architecture of SQLite. https://goo.gl/5RaSol. (2017).Google Scholar
- Radu Stoica and Anastasia Ailamaki. 2013. Enabling Efficient OS Paging for Main-memory OLTP Databases. In Proceedings of the Ninth International Workshop on Data Management on New Hardware (DaMoN '13). ACM, New York, NY, USA, Article 7, 7 pages. Google ScholarDigital Library
- Radu Stoica, Justin J. Levandoski, and Per-Åke Larson. 2013. Identifying Hot and Cold Data in Main-memory Databases. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013) (ICDE '13). IEEE Computer Society, Washington, DC, USA, 26--37. Google ScholarDigital Library
- Y. Sun, Y. Hua, D. Feng, L. Yang, P. Zuo, and S. Cao. 2015. MinCounter: An efficient cuckoo hashing scheme for cloud storage systems. In 2015 31st Symposium on Mass Storage Systems and Technologies (MSST). 1--7. Google ScholarCross Ref
- Translation Lookaside buffer 2017. Translation lookaside buffer. https://goo.gl/yDd2i8. (2017).Google Scholar
- Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. 2015. HydraDB: A Resilient RDMA-driven Key-value Middleware for In-memory Cluster Computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, Article 22, 11 pages. Google ScholarDigital Library
- Xingbo Wu, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, and Song Jiang. 2016. zExpander: A Key-value Cache with Both High Performance and Fewer Misses. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 14, 15 pages. Google ScholarDigital Library
- xxHash 2017. xxHash. http://github.com/Cyan4973/xxHash/. (2017).Google Scholar
- Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proc. VLDB Endow. 6, 10 (Aug. 2013), 817--828. Google ScholarDigital Library
- Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 1567--1581. Google ScholarDigital Library
- Kai Zhang, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, and Xiaodong Zhang. 2015. Mega-KV: A Case for GPUs to Maximize the Throughput of In-memory Key-value Stores. Proc. VLDB Endow. 8, 11 (July 2015), 1226--1237. Google ScholarDigital Library
- Dong Zhou, Bin Fan, Hyeontaek Lim, Michael Kaminsky, and David G. Andersen. 2013. Scalable, High Performance Ethernet Forwarding with CuckooSwitch. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT '13). ACM, New York, NY, USA, 97--108. Google ScholarDigital Library
Index Terms
- Search lookaside buffer: efficient caching for index data structures
Recommendations
The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches
To support dynamic address translation in today's microprocessors, the first-level cache is accessed in parallel with a Translation Lookaside Buffer (TLB). However, this current approach faces mounting problems. This paper introduces new ideas to enable ...
Translation-Lookaside Buffer Consistency
Nine solutions to the cache consistency problem for shared-memory multiprocessors with multiple translation-lookaside buffers (TLBs) are described. A TLB's function is defined, and it is shown how TLB inconsistency arises in uniprocessor and ...
Comments