ABSTRACT
RAM and dynamic pruning schemes to reduce query evaluation times. While only a small portion of lists are processed with dynamic pruning, current systems still store the entire inverted list in cache. In this paper we investigate caching only the pieces of the inverted lists that are actually used to answer a query during dynamic pruning. We examine an LRU cache model, and two recently proposed models. We also introduce a new dynamic pruning scheme for impact-ordered inverted lists.
Using two large web collections and corresponding query logs we show that, using an LRU cache, our new pruning scheme reduces the number of disk accesses during query processing time by 7%-15% over the state-of-the-art impact-ordered baseline, without reducing answer quality. Surprisingly, however, we discover that using our new pruning scheme makes little difference to disk traffic when the more sophisticated caching schemes are employed.
- V. N. Anh. Impact-based Document Retrieval. PhD thesis, Department of Computer Science and Software Engineering, The University of Melbourne, 2004.Google Scholar
- V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR, 2006. Google ScholarDigital Library
- S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM, 2006.Google ScholarDigital Library
- P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In USITS, 1997. Google ScholarDigital Library
- D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR, 2001. Google ScholarDigital Library
- E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web search efficiency via a locality based static pruning method. In WWW, 2005. Google ScholarDigital Library
- S. Garcia. Search Engine Optimisation Using Past Queries. PhD thesis, School of Computer Science and IT, RMIT, 2007.Google Scholar
- D. Hawking. Efficiency/effectiveness trade-offs in query processing (from theory into practice workshop, 1998 sigir conf.). SIGIR Forum, 1998. Google ScholarDigital Library
- D. Hawking, N. Craswell, and P. Thistlewaite. Overview of TREC-7 Very Large Collection Track. In Proc. of TREC-7, 1998.Google Scholar
- B. J. Jansen and A. Spink. An analysis of web documents retrieved and viewed. In H. R. Arabnia and Y. Mun, editors, International Conference on Internet Computing, 2003.Google Scholar
- X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2006. Google ScholarDigital Library
- A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. Information Retrieval, 2006. Google ScholarDigital Library
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 1996. Google ScholarDigital Library
- M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. In JASIS, 1996. Google ScholarDigital Library
- P. C. Saraiva, E. S. de Moura, R. C. Fonseca, W. M. Jr., B. A. Ribeiro-Neto, and N. Ziviani. Rank-preserving two-level caching for scalable search engines. In SIGIR, 2001. Google ScholarDigital Library
- A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: the public and their queries. J. Am. Soc. Inf. Sci. Technol., 2001. Google ScholarDigital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 2006. Google ScholarDigital Library
Index Terms
- Dynamic index pruning for effective caching
Recommendations
On caching search engine query results
In this paper we explore the problem of Caching of Search Engine Query Results in order to reduce the computing and I/O requirements needed to support the functionality of a search engine of the World Wide Web. We study query traces from the EXCITE ...
Caching Dynamic Skyline Queries
SSDBM '08: Proceedings of the 20th international conference on Scientific and Statistical Database ManagementGiven a query tuple <em>q</em>, the dynamic skyline query retrieves the tuples that are not dynamically dominated by any other in the data set with respect to <em>q</em>. A tuple dynamically dominates another, w.r.t. <em>q</em>, if it has closer to <em>...
Caching query-biased snippets for efficient retrieval
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyWeb Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document ...
Comments