Abstract
Set intersection is one of the most important operations for many applications such as Web search engines or database management systems. This paper describes our new algorithm to efficiently find set intersections with sorted arrays on modern processors with SIMD instructions and high branch misprediction penalties. Our algorithm efficiently exploits SIMD instructions and can drastically reduce branch mispredictions. Our algorithm extends a merge-based algorithm by reading multiple elements, instead of just one element, from each of two input arrays and compares all of the pairs of elements from the two arrays to find the elements with the same values. The key insight for our improvement is that we can reduce the number of costly hard-to-predict conditional branches by advancing a pointer by more than one element at a time. Although this algorithm increases the total number of comparisons, we can execute these comparisons more efficiently using the SIMD instructions and gain the benefits of the reduced branch misprediction overhead. Our algorithm is suitable to replace existing standard library functions, such as std::set_intersection in C++, thus accelerating many applications, because the algorithm is simple and requires no preprocessing to generate additional data structures. We implemented our algorithm on Xeon and POWER7+. The experimental results show our algorithm outperforms the std::set_intersection implementation delivered with gcc by up to 5.2x using SIMD instructions and by up to 2.1x even without using SIMD instructions for 32-bit and 64-bit integer datasets. Our SIMD algorithm also outperformed an existing algorithm that can leverage SIMD instructions.
- L. A. Barroso, J. Dean, and U. Hölzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro 23(2), pp. 22--28. 2003. Google ScholarDigital Library
- B. Ding and A. C. König. Fast set intersection in memory. In Proceedings of VLDB Endow., 4, pp. 255--266, 2011. Google ScholarDigital Library
- R. Baeza-Yates and A. Salinger. Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences. In Proceedings of the International Conference on String Processing and Information Retrieval, pp. 13--24, 2005. Google ScholarDigital Library
- R. Baeza-Yates. A Fast Set Intersection Algorithm for Sorted Sequences. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching, pp. 400--408, 2004.Google Scholar
- E. D. Demaine, A. López-Ortiz, and J. I. Munro. Adaptive set intersections, unions, and differences. In Proceedings of the Annual Symposium on Discrete Algorithms, pp. 743--752, 2000. Google ScholarDigital Library
- P. Bille, A Pagh, and R. Pagh. Fast evaluation of union-intersection expressions. In Proceedings of the International Conference on Algorithms and Computation, pp. 739--750, 2007. Google ScholarDigital Library
- P. Sanders, F. Transier. Intersection in Integer Inverted Indices. In Proceedings of the Workshop on Algorithm Engineering and Experiments, pp. 71--83, 2007.Google ScholarCross Ref
- B. Schlegel, T. Willhalm, and W. Lehner. Fast Sorted-Set Intersection using SIMD Instructions. In Proceedings of the Second International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, 2011.Google Scholar
- D. Lemire, L. Boytsov and N. Kurz. SIMD Compression and the Intersection of Sorted Integers. arXiv: 1401.6399, 2014Google Scholar
- J. L. Bentley and A. C. Yao. An almost optimal algorithm for unbounded searching. Information processing letters, 5(3), pp. 82--87, 1976.Google Scholar
- J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the Symposium on Principles of Programming Languages, 1983. Google ScholarDigital Library
- J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 145--156, 2002. Google ScholarDigital Library
- P. Sanders and S. Winkel. Super Scalar Sample Sort. In Proceedings of the European Symposium on Algorithms, Volume 3221 of LNCS, pp. 784--796, 2004.Google ScholarCross Ref
- H. Inoue, T. Moriyama, H. Komatsu, and T. Nakatani. AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors, In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp. 189--198, 2007. Google ScholarDigital Library
- J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient implementation of sorting on multi-core SIMD CPU architecture. In Proceedings. VLDB Endow., 1(2), pp. 1313--1324, 2008. Google ScholarDigital Library
- D. Tsirogiannis, S. Guha, and N. Koudas. Improving the performance of list intersection. In Proceedings of VLDB Endow., 2(1), pp. 838--849, 2009. Google ScholarDigital Library
- S. Tatikonda, F. Junqueira, B. B. Cambazoglu, and V. Plachouras. On efficient posting list intersection with multicore processors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009. Google ScholarDigital Library
- N. Ao, F. Zhang, D. Wu, D. S. Stones, G. Wang, X. Liu, J. Liu, and S. Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. In Proceedings of VLDB Endow., 4(8), pp. 470--481, 2011. Google ScholarDigital Library
- D. R. Musser. Introspective Sorting and Selection Algorithms. Software Practice and Experience, 27(8), pp. 983--993, 1997. Google ScholarDigital Library
Recommendations
Implementing database operations using SIMD instructions
SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of dataModern CPUs have instructions that allow basic operations to be performed on several data elements in parallel. These instructions are called SIMD instructions, since they apply a single instruction to multiple data elements. SIMD technology was ...
Retargetable code optimization with SIMD instructions
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesisRetargetable C compilers are nowadays widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. One frequent concern about retargetable compilers, though, is their lack of machine-...
Comments