ABSTRACT
In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.
- V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In ACM CIKM, 2006.Google ScholarDigital Library
- R. A. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In SPIRE, 2003.Google Scholar
- C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In ACM SIGIR, 1985. Google ScholarDigital Library
- S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In ACM CIKM, 2006. Google ScholarDigital Library
- P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In USITS, 1997. Google ScholarDigital Library
- P. Denning. Working sets past and present. IEEE Trans. on Software Engineering, SE-6(1):64--84, 1980. Google ScholarDigital Library
- T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarDigital Library
- R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In WWW, 2003. Google ScholarDigital Library
- X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005. Google ScholarDigital Library
- E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001. Google ScholarDigital Library
- I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A High Performance and Scalable Information Retrieval Platform. In SIGIR Workshop on Open Source Information Retrieval, 2006.Google Scholar
- V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In ACM SIGIR, 1995. Google ScholarDigital Library
- P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In ACM SIGIR, 2001. Google ScholarDigital Library
- D. R. Slutz and I. L. Traiger. A note on the calculation of average working set size. Communications of the ACM, 17(10):563--565, 1974. Google ScholarDigital Library
- T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In ACM SIGIR, 2005. Google ScholarDigital Library
- I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images. John Wiley & Sons, Inc., NY, 1994. Google ScholarDigital Library
- N. E. Young. On-line file caching. Algorithmica, 33(3):371--383, 2002.Google ScholarCross Ref
Index Terms
- The impact of caching on search engines
Recommendations
Design trade-offs for search engine caching
In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query ...
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial ...
Three-Level Caching for Efficient Query Processing in Large Web Search Engines
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of ...
Comments