skip to main content
10.1145/1321440.1321592acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Dynamic index pruning for effective caching

Published:06 November 2007Publication History

ABSTRACT

RAM and dynamic pruning schemes to reduce query evaluation times. While only a small portion of lists are processed with dynamic pruning, current systems still store the entire inverted list in cache. In this paper we investigate caching only the pieces of the inverted lists that are actually used to answer a query during dynamic pruning. We examine an LRU cache model, and two recently proposed models. We also introduce a new dynamic pruning scheme for impact-ordered inverted lists.

Using two large web collections and corresponding query logs we show that, using an LRU cache, our new pruning scheme reduces the number of disk accesses during query processing time by 7%-15% over the state-of-the-art impact-ordered baseline, without reducing answer quality. Surprisingly, however, we discover that using our new pruning scheme makes little difference to disk traffic when the more sophisticated caching schemes are employed.

References

  1. V. N. Anh. Impact-based Document Retrieval. PhD thesis, Department of Computer Science and Software Engineering, The University of Melbourne, 2004.Google ScholarGoogle Scholar
  2. V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Cao and S. Irani. Cost-aware www proxy caching algorithms. In USITS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web search efficiency via a locality based static pruning method. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Garcia. Search Engine Optimisation Using Past Queries. PhD thesis, School of Computer Science and IT, RMIT, 2007.Google ScholarGoogle Scholar
  8. D. Hawking. Efficiency/effectiveness trade-offs in query processing (from theory into practice workshop, 1998 sigir conf.). SIGIR Forum, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Hawking, N. Craswell, and P. Thistlewaite. Overview of TREC-7 Very Large Collection Track. In Proc. of TREC-7, 1998.Google ScholarGoogle Scholar
  10. B. J. Jansen and A. Spink. An analysis of web documents retrieved and viewed. In H. R. Arabnia and Y. Mun, editors, International Conference on Internet Computing, 2003.Google ScholarGoogle Scholar
  11. X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. Information Retrieval, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. In JASIS, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. C. Saraiva, E. S. de Moura, R. C. Fonseca, W. M. Jr., B. A. Ribeiro-Neto, and N. Ziviani. Rank-preserving two-level caching for scalable search engines. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: the public and their queries. J. Am. Soc. Inf. Sci. Technol., 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic index pruning for effective caching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
      November 2007
      1048 pages
      ISBN:9781595938039
      DOI:10.1145/1321440

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader