Article

The impact of caching on search engines

Authors:
Ricardo Baeza-Yates

Yahoo! Research

Yahoo! Research
View Profile

,
Aristides Gionis

Yahoo! Research

Yahoo! Research
View Profile

,
Flavio Junqueira

Yahoo! Research

Yahoo! Research
View Profile

,
Vanessa Murdock

Yahoo! Research

Yahoo! Research
View Profile

,
Vassilis Plachouras

Yahoo! Research

Yahoo! Research
View Profile

,
Fabrizio Silvestri

ISTI - CNR

ISTI - CNR
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 183–190https://doi.org/10.1145/1277741.1277775

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 183–190

ABSTRACT

In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

References

V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In ACM CIKM, 2006.Google ScholarDigital Library
R. A. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In SPIRE, 2003.Google Scholar
C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In ACM SIGIR, 1985. Google ScholarDigital Library
S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In ACM CIKM, 2006. Google ScholarDigital Library
P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In USITS, 1997. Google ScholarDigital Library
P. Denning. Working sets past and present. IEEE Trans. on Software Engineering, SE-6(1):64--84, 1980. Google ScholarDigital Library
T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarDigital Library
R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In WWW, 2003. Google ScholarDigital Library
X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarDigital Library
E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarDigital Library
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A High Performance and Scalable Information Retrieval Platform. In SIGIR Workshop on Open Source Information Retrieval, 2006.Google Scholar
V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In ACM SIGIR, 1995. Google ScholarDigital Library
P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In ACM SIGIR, 2001. Google ScholarDigital Library
D. R. Slutz and I. L. Traiger. A note on the calculation of average working set size. Communications of the ACM, 17(10):563--565, 1974. Google ScholarDigital Library
T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In ACM SIGIR, 2005. Google ScholarDigital Library
I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images. John Wiley & Sons, Inc., NY, 1994. Google ScholarDigital Library
N. E. Young. On-line file caching. Algorithmica, 33(3):371--383, 2002.Google ScholarCross Ref

Index Terms

The impact of caching on search engines
1. Information systems
  1. Information retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

Design trade-offs for search engine caching

In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query ...
Read More
Three-level caching for efficient query processing in large Web search engines
WWW '05: Proceedings of the 14th international conference on World Wide Web

Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of ...
Read More
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
caching
information retrieval systems
query logs
web search
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 157
  Total Citations
  View Citations
- 1,414
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The impact of caching on search engines

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design trade-offs for search engine caching

Three-level caching for efficient query processing in large Web search engines

Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data