Article

Type less, find more: fast autocompletion search with a succinct index

Authors:
Holger Bast

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

,
Ingmar Weber

Max-Planck-Institut für Informatik, Saarbrücken, Germany

Max-Planck-Institut für Informatik, Saarbrücken, Germany
View Profile

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2006Pages 364–371https://doi.org/10.1145/1148170.1148234

Published:06 August 2006Publication History

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 364–371

ABSTRACT

We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.

References

A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
S. Alstrup, G. S. Brodal, and T. Rauhe. New data structures for orthogonal range searching. In 41st Symposium on Foundations of Computer Science (FOCS'00), pages 198--207, 2000. Google ScholarDigital Library
V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Information Retrieval, 8:151--166, 2005. Google ScholarDigital Library
L. Arge, V. Samoladas, and J. S. Vitter. On two-dimensional indexability and optimal range search indexing. In 18th Symposium on Principles of Database Systems (PODS'99), pages 346--357, 1999. Google ScholarDigital Library
R. Baeza-Yates. A fast set intersection algorithm for sorted sequences. Lecture Notes in Computer Science, 3109:400--408, 2004.Google ScholarDigital Library
S. Bickel, P. Haider, and T. Scheffer. Learning to complete sentences. In 16th European Conference on Machine Learning (ECML'05), pages 497--504, 2005. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, and I. Soboroff. The TREC terabyte retrieval track. SIGIR Forum, 39(1):25, 2005. Google ScholarDigital Library
J. J. Darragh, I. H. Witten, and M. L. James. The reactive keyboard: A predictive typing aid. IEEE Computer, pages 41--49, 1990. Google ScholarDigital Library
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarDigital Library
P. Ferragina, N. Koudas, S. Muthukrishnan, and D. Srivastava. Two-dimensional substring indexing. Journal of Computer and System Science, 66(4):763--774, 2003. Google ScholarDigital Library
P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552--581, 2005. Google ScholarDigital Library
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. In 10th International World Wide Web Conference (WWW10), pages 406--414, 2001. Google ScholarDigital Library
V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarDigital Library
K. Grabski and T. Scheffer. Sentence completion. In 27th Conference on Research and Development in Information Retrieval (SIGIR'04), pages 433--439, 2004. Google ScholarDigital Library
M. Jakobsson. Autocompletion in full text transaction entry: a method for humanized input. In Conference on Human Factors in Computing Systems (CHI'86), pages 327--323, 1986. Google ScholarDigital Library
D. Metzler, T. Strohman, H. Turtle, and W. B. Croft. Indri at TREC 2004: Terabyte track. In 13th Text Retrieval Conference (TREC'04), 2004.Google Scholar
A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4):349--379, 1996. Google ScholarDigital Library
G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan. Scalable browsing for large collections: A case study. In 5th Conference on Digital Libraries (DL'00), pages 215--223, 2000. Google ScholarDigital Library
S. E. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In 4th Text Retrieval Conference (TREC'95), pages 73--96, 1995.Google Scholar
T. Stocky, A. Faaborg, and H. Lieberman. A commonsense approach to predictive text entry. In Conference on Human Factors in Computing Systems (CHI'04), pages 1163--1166, 2004. Google ScholarDigital Library
E. M. Voorhees. Query expansion using lexical-semantic relations. In 17th Conference on Research and Development in Information Retrieval (SIGIR'94), pages 171--180, 1994. Google ScholarDigital Library
H. Williams and J. Zobel. Compressing integers for fast file access. Computer Journal, 42(3):193--201, 1999.Google ScholarDigital Library
I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edition. Morgan Kaufmann, 1999. Google ScholarDigital Library
J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453--490, 1998. Google ScholarDigital Library

Index Terms

Type less, find more: fast autocompletion search with a succinct index
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
  2. Interaction design
    1. Interaction design theory, concepts and paradigms
2. Information systems
  1. Information retrieval

Recommendations

Output-sensitive autocompletion search

We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the best such hits. ...
Read More
Autocompletion for Prefix-Abbreviated Input
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Query autocompletion (QAC) is an important interactive feature that assists users in formulating queries and saving keystrokes. Due to the convenience it brings to users, QAC has been adopted in many applications, including Web search engines, ...
Read More
Q2P: Discovering Query Templates via Autocompletion

We present Q2P, a system that discovers query templates from search engines via their query autocompletion services. Q2P is distinct from the existing works in that it does not rely on query logs of search engines that are typically not readily ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
autocompletion
empirical entropy
index data structure
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 108
  Total Citations
  View Citations
- 1,593
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Type less, find more: fast autocompletion search with a succinct index

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Output-sensitive autocompletion search

Autocompletion for Prefix-Abbreviated Input

Q2P: Discovering Query Templates via Autocompletion