ABSTRACT
The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58% over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In WWW7, pages 107--117, 1998. Google ScholarDigital Library
- R. Fagin, Ravi K., K. S. McCurley, J. Novak, D. Sivakumar, J. A. Tomlin, and D. P. Williamson. Searching the workplace web. In WWW2003, Budapest, Hungary, May 2003. Google ScholarDigital Library
- T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarDigital Library
- J.-L. Gailly and M. Adler. Zlib Compression Library. www.zlib.net. Accessed January 2007.Google Scholar
- S. Garcia, H. E. Williams, and A. Cannane. Access-ordered indexes. In V. Estivill-Castro, editor, Proc. Australasian Computer Science Conference, pages 7--14, Dunedin, New Zealand, 2004. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S. Leung. The google ?le system. In SOSP '03: Proc. of the 19th ACM Symposium on Operating Systems Principles, pages 29--43, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
- J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: sentence selection and evaluation metrics. In SIGIR99, pages 121--128, 1999. Google ScholarDigital Library
- D. Hawking, Nick C., and Paul Thistlewaite. Overview of TREC-7 Very Large Collection Track. In Proc. of TREC-7, pages 91--104, November 1998.Google Scholar
- B. J. Jansen, A. Spink, and J. Pedersen. A temporal comparison of altavista web searching. J. Am. Soc. Inf. Sci. Tech. (JASIST), 56(6):559--570, April 2005. Google ScholarDigital Library
- J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR95, pages 68--73, 1995. Google ScholarDigital Library
- S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107--109, July 1999.Google ScholarCross Ref
- H. P. Luhn. The automatic creation of literature abstracts. IBM Journal, pages 159--165, April 1958.Google ScholarDigital Library
- I. Mani. Automatic Summarization, volume 3 of Natural Language Processing. John Benjamins Publishing Company, Amsterdam/Philadelphia, 2001.Google Scholar
- A. Moffat, J. Zobel, and N. Sharman. Text compression for dynamic document databases. Knowledge and Data Engineering, 9(2):302--313, 1997. Google ScholarDigital Library
- G. Navarro and V. Mäkinen.Compressed full text indexes. ACM Computing Surveys, 2007. To appear. Google ScholarDigital Library
- D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Comput. Linguist., 28(4):399--408, 2002. Google ScholarDigital Library
- M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In WWW06, pages 707--715, 2006. Google ScholarDigital Library
- T. Sakai and K. Sparck-Jones. Generic summaries for indexing in information retrieval. In SIGIR01, pages 190--198, 2001. Google ScholarDigital Library
- H. G. Silber and K. F. McCoy. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Comput. Linguist., 28(4):487--496, 2002. Google ScholarDigital Library
- A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In SIGIR98, pages 2--10, Melbourne, Aust., August 1998. Google ScholarDigital Library
- R. W. White, I. Ruthven, and J. M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In SIGIR02, pages 57--64, 2002. Google ScholarDigital Library
- H. E. Williams and J. Zobel. Compressing integers for fast ?le access. Comp. J., 42(3):193--201, 1999.Google ScholarCross Ref
- H. E. Williams and J. Zobel. Searchable words on the Web. International Journal on Digital Libraries, 5(2):99--105, April 2005.Google ScholarDigital Library
- I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco, second edition, May 1999. Google ScholarDigital Library
- The Zettair Search Engine. www.seg.rmit.edu.au/zettair. Accessed January 2007.Google Scholar
Index Terms
- Fast generation of result snippets in web search
Recommendations
Auditing the Partisanship of Google Search Snippets
WWW '19: The World Wide Web ConferenceThe text snippets presented in web search results provide users with a slice of page content that they can quickly scan to help inform their click decisions. However, little is known about how these snippets are generated or how they relate to a user's ...
Caching query-biased snippets for efficient retrieval
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyWeb Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document ...
Incorporating compactness to generate term-association view snippets for ontology search
A query-relevant snippet for ontology search is useful for deciding if an ontology fits users' needs. In this paper, we illustrate a good snippet in a keyword-based ontology search engine should be with term-association view and compact, and propose an ...
Comments