skip to main content
10.1145/1277741.1277775acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

The impact of caching on search engines

Published:23 July 2007Publication History

ABSTRACT

In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

References

  1. V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In ACM CIKM, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. A. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In SPIRE, 2003.Google ScholarGoogle Scholar
  3. C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In ACM SIGIR, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In ACM CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In USITS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Denning. Working sets past and present. IEEE Trans. on Software Engineering, SE-6(1):64--84, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In WWW, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A High Performance and Scalable Information Retrieval Platform. In SIGIR Workshop on Open Source Information Retrieval, 2006.Google ScholarGoogle Scholar
  12. V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In ACM SIGIR, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In ACM SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. R. Slutz and I. L. Traiger. A note on the calculation of average working set size. Communications of the ACM, 17(10):563--565, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In ACM SIGIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images. John Wiley & Sons, Inc., NY, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. E. Young. On-line file caching. Algorithmica, 33(3):371--383, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The impact of caching on search engines

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
              July 2007
              946 pages
              ISBN:9781595935977
              DOI:10.1145/1277741

              Copyright © 2007 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 23 July 2007

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate792of3,983submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader