research-article

Posting list intersection on multicore architectures

Authors:
Shirish Tatikonda

IBM Almaden Research, San Jose, CA, USA

IBM Almaden Research, San Jose, CA, USA
View Profile

,
B. Barla Cambazoglu

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

,
Flavio P. Junqueira

Yahoo! Research, Barcelona, Spain

Yahoo! Research, Barcelona, Spain
View Profile

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalJuly 2011Pages 963–972https://doi.org/10.1145/2009916.2010045

Published:24 July 2011Publication History

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Pages 963–972

ABSTRACT

In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.

References

V. N. Anh, O. Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. 24th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 35--42, 2001. Google ScholarDigital Library
V. N. Anh and A. Moffat. Compressed inverted files with reduced decoding overheads. In Proc. 21th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 290--297, 1998. Google ScholarDigital Library
C. Badue, R. Baeza-Yates, B. Ribeiro-Neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In Proc. 8th Symp. String Processing and Information Retrieval, pages 10--20, 2001.Google ScholarCross Ref
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proc. 30th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 183--190, 2007. Google ScholarDigital Library
R. A. Baeza-Yates. A fast set intersection algorithm for sorted sequences. In Proc. 15th Annual Symp. Combinatorial Pattern Matching, pages 400--408, 2004.Google ScholarCross Ref
J. Barbay. Optimality of randomized algorithms for the intersection problem. In Proc. 2nd Int'l Symp. Stochastic Algorithms: Foundations and Applications, pages 26--38, 2003.Google ScholarCross Ref
J. Barbay, A. López-Ortiz, and T. Lu. Faster adaptive set intersections for text searching. In Proc. 5th Int'l Workshop on Experimental Algorithms, pages 146--157, 2006. Google ScholarDigital Library
L.A. Barroso, J. Dean, and U. Holzle. Web search for a planet: the Google cluster architecture. IEEE Micro, 23(2):22--28, 2003. Google ScholarDigital Library
N.J. Belkin, D. Kelly, G. Kim, J.Y. Kim, H.J. Lee, G. Muresan, M.C. Tang, X.J. Yuan, and C. Cool. Query length in interactive information retrieval. In Proc. 26th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 205--212. ACM New York, NY, USA, 2003. Google ScholarDigital Library
R. Blanco. Index compression for information retrieval systems. PhD thesis, University of A Coruna, 2008.Google Scholar
C. Bonacic, C. Garcia, M. Marin, M. Prieto, F. Tirado, and C. Vicente. Improving search engines performance on multithreading processors. In Proc. 8th Int'l Conf. High Performance Computing for Computational Science, pages 201--213, 2008. Google ScholarDigital Library
Eric A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46--55, 2001. Google ScholarDigital Library
C. Buckley and A. Lewit. Optimizations of inverted vector searches. In Proc. 8th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 97--110, 1985. Google ScholarDigital Library
B.B. Cambazoglu, F.P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In Proc. 19th Int'l Conf. World Wide Web, pages 181--190, 2010. Google ScholarDigital Library
B.B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. Early exit optimizations for additive machine learned ranking systems. In Proc. 3rd ACM Int'l Conf. Web Search and Data Mining, pages 411--420, 2010. Google ScholarDigital Library
E. Demaine, A. López-Ortiz, and J. I. Munro. Adaptive set intersections, unions, and differences. In Proc. 11th ACM-SIAM Symp. Discrete Algorithms, pages 743--752, 2000. Google ScholarDigital Library
E. Demaine, A. López-Ortiz, and J. I. Munro. Experiments on adaptive set intersections for text retrieval systems. Lect. Notes Comput. Sc., 2153:91--104, 2001. Google ScholarDigital Library
S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proc. 18th Int'l Conf. World Wide Web, pages 421--430, 2009. Google ScholarDigital Library
E. Frachtenberg. Reducing query latencies in web search using fine-grained parallelism. World Wide Web, 12(4):441--460, 2009. Google ScholarDigital Library
Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proc. 18th Int'l Conf. World Wide Web, pages 431--440, 2009. Google ScholarDigital Library
A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. Inf. Retr., 10(3):205--231, 2007. Google ScholarDigital Library
A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 14(4):349--379, 1996. Google ScholarDigital Library
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci., 47(10):749--764, 1996. Google ScholarDigital Library
D. Puppin, F. Silvestri, and D. Laforenza. Query-driven document partitioning and collection selection. In Proc. 1st Int'l Conf. Scalable Information Systems, 2006. Google ScholarDigital Library
B. Ribeiro-Neto and R. A. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proc. 3rd ACM Conf. Digital Libraries, pages 182--190, 1998. Google ScholarDigital Library
E. Schurman and J. Brutlag. Performance related changes and their user impact. Velocity -- Web Performance and Operations Conf., 2009.Google Scholar
Gleb Skobeltsyn, Flavio Junqueira, Vassilis Plachouras, and Ricardo Baeza-Yates. ResIn: a combination of results caching and index pruning for high-performance web search engines. In Proc. 31st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 131--138, 2008. Google ScholarDigital Library
T. Strohman and W. Croft. Efficient document retrieval in main memory. In Proc. 30th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 175--182, 2007. Google ScholarDigital Library
T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In Proc. 28th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 219--225, 2005. Google ScholarDigital Library
S. Tatikonda and S. Parthasarathy. Mining tree-structured data on multicore systems. Proc. VLDB Endow., 2(1):694--705, 2009. Google ScholarDigital Library
D. Tsirogiannis, S. Guha, and N. Koudas. Improving the performance of list intersection. In Proc. 35th Int'l Conf. Very Large Data Bases, pages 838--849, 2009.Google ScholarDigital Library
J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In Proc. 17th Int'l Conf. World Wide Web, pages 387--396, 2008. Google ScholarDigital Library
M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In Proc. 22nd Int'l Conf. Data Engineering, 2006. Google ScholarDigital Library

Index Terms

Posting list intersection on multicore architectures
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Exploiting fine-grain thread parallelism on multicore architectures
Software Development for Multi-core Computing Systems

In this work we present a runtime threading system which provides an efficient substrate for fine-grain parallelism, suitable for deployment in multicore platforms. Its architecture encompasses a number of optimizations that make it particularly ...
Read More
Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Recently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel ...
Read More
Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids

We present a computational method of coupling average interpolating wavelets with high-order finite volume schemes and its implementation on heterogeneous computer architectures for the simulation of multiphase compressible flows. The method is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
July 2011
1374 pages
ISBN:9781450307574
DOI:10.1145/2009916
General Chairs:
Wei-Ying Ma
Microsoft Research Asia, China
,
Jian-Yun Nie
University of Montreal, Canada
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Research, Spain
,
Tat-Seng Chua
National University of Singapore
,
W. Bruce Croft
University of Massachusetts, Amherst, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
intra-query parallelism
multicore architectures
posting list intersection
query processing
web search engines
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 584
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Posting list intersection on multicore architectures

SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting fine-grain thread parallelism on multicore architectures

Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor

Multicore/Multi-GPU Accelerated Simulations of Multiphase Compressible Flows Using Wavelet Adapted Grids