skip to main content
10.1145/1277741.1277766acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Fast generation of result snippets in web search

Published:23 July 2007Publication History

ABSTRACT

The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58% over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.

References

  1. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In WWW7, pages 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Fagin, Ravi K., K. S. McCurley, J. Novak, D. Sivakumar, J. A. Tomlin, and D. P. Williamson. Searching the workplace web. In WWW2003, Budapest, Hungary, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J.-L. Gailly and M. Adler. Zlib Compression Library. www.zlib.net. Accessed January 2007.Google ScholarGoogle Scholar
  5. S. Garcia, H. E. Williams, and A. Cannane. Access-ordered indexes. In V. Estivill-Castro, editor, Proc. Australasian Computer Science Conference, pages 7--14, Dunedin, New Zealand, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Ghemawat, H. Gobioff, and S. Leung. The google ?le system. In SOSP '03: Proc. of the 19th ACM Symposium on Operating Systems Principles, pages 29--43, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: sentence selection and evaluation metrics. In SIGIR99, pages 121--128, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Hawking, Nick C., and Paul Thistlewaite. Overview of TREC-7 Very Large Collection Track. In Proc. of TREC-7, pages 91--104, November 1998.Google ScholarGoogle Scholar
  9. B. J. Jansen, A. Spink, and J. Pedersen. A temporal comparison of altavista web searching. J. Am. Soc. Inf. Sci. Tech. (JASIST), 56(6):559--570, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer. In SIGIR95, pages 68--73, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107--109, July 1999.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. P. Luhn. The automatic creation of literature abstracts. IBM Journal, pages 159--165, April 1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Mani. Automatic Summarization, volume 3 of Natural Language Processing. John Benjamins Publishing Company, Amsterdam/Philadelphia, 2001.Google ScholarGoogle Scholar
  14. A. Moffat, J. Zobel, and N. Sharman. Text compression for dynamic document databases. Knowledge and Data Engineering, 9(2):302--313, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Navarro and V. Mäkinen.Compressed full text indexes. ACM Computing Surveys, 2007. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Comput. Linguist., 28(4):399--408, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In WWW06, pages 707--715, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Sakai and K. Sparck-Jones. Generic summaries for indexing in information retrieval. In SIGIR01, pages 190--198, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. G. Silber and K. F. McCoy. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Comput. Linguist., 28(4):487--496, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In SIGIR98, pages 2--10, Melbourne, Aust., August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. W. White, I. Ruthven, and J. M. Jose. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In SIGIR02, pages 57--64, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. E. Williams and J. Zobel. Compressing integers for fast ?le access. Comp. J., 42(3):193--201, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  23. H. E. Williams and J. Zobel. Searchable words on the Web. International Journal on Digital Libraries, 5(2):99--105, April 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco, second edition, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. The Zettair Search Engine. www.seg.rmit.edu.au/zettair. Accessed January 2007.Google ScholarGoogle Scholar

Index Terms

  1. Fast generation of result snippets in web search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
        July 2007
        946 pages
        ISBN:9781595935977
        DOI:10.1145/1277741

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader