Article

Fast generation of result snippets in web search

Authors:
Andrew Turpin

RMIT University

RMIT University
View Profile

,
Yohannes Tsegay

RMIT University

RMIT University
View Profile

,
David Hawking

CSIRO ICT Centre

CSIRO ICT Centre
View Profile

,
Hugh E. Williams

Microsoft Corporation

Microsoft Corporation
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 127–134https://doi.org/10.1145/1277741.1277766

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 127–134

ABSTRACT

The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58% over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.

References

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In WWW7, pages 107--117, 1998. Google ScholarDigital Library
R. Fagin, Ravi K., K. S. McCurley, J. Novak, D. Sivakumar, J. A. Tomlin, and D. P. Williamson. Searching the workplace web. In WWW2003, Budapest, Hungary, May 2003. Google ScholarDigital Library
T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarDigital Library
J.-L. Gailly and M. Adler. Zlib Compression Library. www.zlib.net. Accessed January 2007.Google Scholar
S. Garcia, H. E. Williams, and A. Cannane. Access-ordered indexes. In V. Estivill-Castro, editor, Proc. Australasian Computer Science Conference, pages 7--14, Dunedin, New Zealand, 2004. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S. Leung. The google ?le system. In SOSP '03: Proc. of the 19th ACM Symposium on Operating Systems Principles, pages 29--43, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: sentence selection and evaluation metrics. In SIGIR99, pages 121--128, 1999. Google ScholarDigital Library
D. Hawking, Nick C., and Paul Thistlewaite. Overview of TREC-7 Very Large Collection Track. In Proc. of TREC-7, pages 91--104, November 1998.Google Scholar
B. J. Jansen, A. Spink, and J. Pedersen. A temporal comparison of altavista web searching. J. Am. Soc. Inf. Sci. Tech. (JASIST), 56(6):559--570, April 2005. Google ScholarDigital Library
J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR95, pages 68--73, 1995. Google ScholarDigital Library
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107--109, July 1999.Google ScholarCross Ref
H. P. Luhn. The automatic creation of literature abstracts. IBM Journal, pages 159--165, April 1958.Google ScholarDigital Library
I. Mani. Automatic Summarization, volume 3 of Natural Language Processing. John Benjamins Publishing Company, Amsterdam/Philadelphia, 2001.Google Scholar
A. Moffat, J. Zobel, and N. Sharman. Text compression for dynamic document databases. Knowledge and Data Engineering, 9(2):302--313, 1997. Google ScholarDigital Library
G. Navarro and V. Mäkinen.Compressed full text indexes. ACM Computing Surveys, 2007. To appear. Google ScholarDigital Library
D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Comput. Linguist., 28(4):399--408, 2002. Google ScholarDigital Library
M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In WWW06, pages 707--715, 2006. Google ScholarDigital Library
T. Sakai and K. Sparck-Jones. Generic summaries for indexing in information retrieval. In SIGIR01, pages 190--198, 2001. Google ScholarDigital Library
H. G. Silber and K. F. McCoy. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Comput. Linguist., 28(4):487--496, 2002. Google ScholarDigital Library
A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In SIGIR98, pages 2--10, Melbourne, Aust., August 1998. Google ScholarDigital Library
R. W. White, I. Ruthven, and J. M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In SIGIR02, pages 57--64, 2002. Google ScholarDigital Library
H. E. Williams and J. Zobel. Compressing integers for fast ?le access. Comp. J., 42(3):193--201, 1999.Google ScholarCross Ref
H. E. Williams and J. Zobel. Searchable words on the Web. International Journal on Digital Libraries, 5(2):99--105, April 2005.Google ScholarDigital Library
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco, second edition, May 1999. Google ScholarDigital Library
The Zettair Search Engine. www.seg.rmit.edu.au/zettair. Accessed January 2007.Google Scholar

Index Terms

Fast generation of result snippets in web search
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Information retrieval query processing

Recommendations

Auditing the Partisanship of Google Search Snippets
WWW '19: The World Wide Web Conference

The text snippets presented in web search results provide users with a slice of page content that they can quickly scan to help inform their click decisions. However, little is known about how these snippets are generated or how they relate to a user's ...
Read More
Caching query-biased snippets for efficient retrieval
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

Web Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document ...
Read More
Incorporating compactness to generate term-association view snippets for ontology search

A query-relevant snippet for ontology search is useful for deciding if an ontology fits users' needs. In this paper, we illustrate a good snippet in a keyword-based ontology search engine should be with term-association view and compact, and propose an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document caching
snippet generation
web summaries
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 102
  Total Citations
  View Citations
- 1,714
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast generation of result snippets in web search

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Auditing the Partisanship of Google Search Snippets

Caching query-biased snippets for efficient retrieval

Incorporating compactness to generate term-association view snippets for ontology search