skip to main content
10.1145/1148170.1148234acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Type less, find more: fast autocompletion search with a succinct index

Authors Info & Claims
Published:06 August 2006Publication History

ABSTRACT

We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.

References

  1. A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Alstrup, G. S. Brodal, and T. Rauhe. New data structures for orthogonal range searching. In 41st Symposium on Foundations of Computer Science (FOCS'00), pages 198--207, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Information Retrieval, 8:151--166, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Arge, V. Samoladas, and J. S. Vitter. On two-dimensional indexability and optimal range search indexing. In 18th Symposium on Principles of Database Systems (PODS'99), pages 346--357, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Baeza-Yates. A fast set intersection algorithm for sorted sequences. Lecture Notes in Computer Science, 3109:400--408, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Bickel, P. Haider, and T. Scheffer. Learning to complete sentences. In 16th European Conference on Machine Learning (ECML'05), pages 497--504, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. L. A. Clarke, N. Craswell, and I. Soboroff. The TREC terabyte retrieval track. SIGIR Forum, 39(1):25, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. J. Darragh, I. H. Witten, and M. L. James. The reactive keyboard: A predictive typing aid. IEEE Computer, pages 41--49, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Ferragina, N. Koudas, S. Muthukrishnan, and D. Srivastava. Two-dimensional substring indexing. Journal of Computer and System Science, 66(4):763--774, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552--581, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. In 10th International World Wide Web Conference (WWW10), pages 406--414, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Grabski and T. Scheffer. Sentence completion. In 27th Conference on Research and Development in Information Retrieval (SIGIR'04), pages 433--439, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Jakobsson. Autocompletion in full text transaction entry: a method for humanized input. In Conference on Human Factors in Computing Systems (CHI'86), pages 327--323, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Metzler, T. Strohman, H. Turtle, and W. B. Croft. Indri at TREC 2004: Terabyte track. In 13th Text Retrieval Conference (TREC'04), 2004.Google ScholarGoogle Scholar
  17. A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4):349--379, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan. Scalable browsing for large collections: A case study. In 5th Conference on Digital Libraries (DL'00), pages 215--223, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. E. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In 4th Text Retrieval Conference (TREC'95), pages 73--96, 1995.Google ScholarGoogle Scholar
  20. T. Stocky, A. Faaborg, and H. Lieberman. A commonsense approach to predictive text entry. In Conference on Human Factors in Computing Systems (CHI'04), pages 1163--1166, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. M. Voorhees. Query expansion using lexical-semantic relations. In 17th Conference on Research and Development in Information Retrieval (SIGIR'94), pages 171--180, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Williams and J. Zobel. Compressing integers for fast file access. Computer Journal, 42(3):193--201, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edition. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453--490, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Type less, find more: fast autocompletion search with a succinct index

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
              August 2006
              768 pages
              ISBN:1595933697
              DOI:10.1145/1148170

              Copyright © 2006 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 6 August 2006

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate792of3,983submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader