ABSTRACT
We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.
- A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
- S. Alstrup, G. S. Brodal, and T. Rauhe. New data structures for orthogonal range searching. In 41st Symposium on Foundations of Computer Science (FOCS'00), pages 198--207, 2000. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Information Retrieval, 8:151--166, 2005. Google ScholarDigital Library
- L. Arge, V. Samoladas, and J. S. Vitter. On two-dimensional indexability and optimal range search indexing. In 18th Symposium on Principles of Database Systems (PODS'99), pages 346--357, 1999. Google ScholarDigital Library
- R. Baeza-Yates. A fast set intersection algorithm for sorted sequences. Lecture Notes in Computer Science, 3109:400--408, 2004.Google ScholarDigital Library
- S. Bickel, P. Haider, and T. Scheffer. Learning to complete sentences. In 16th European Conference on Machine Learning (ECML'05), pages 497--504, 2005. Google ScholarDigital Library
- C. L. A. Clarke, N. Craswell, and I. Soboroff. The TREC terabyte retrieval track. SIGIR Forum, 39(1):25, 2005. Google ScholarDigital Library
- J. J. Darragh, I. H. Witten, and M. L. James. The reactive keyboard: A predictive typing aid. IEEE Computer, pages 41--49, 1990. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614--656, 2003. Google ScholarDigital Library
- P. Ferragina, N. Koudas, S. Muthukrishnan, and D. Srivastava. Two-dimensional substring indexing. Journal of Computer and System Science, 66(4):763--774, 2003. Google ScholarDigital Library
- P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552--581, 2005. Google ScholarDigital Library
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: The concept revisited. In 10th International World Wide Web Conference (WWW10), pages 406--414, 2001. Google ScholarDigital Library
- V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarDigital Library
- K. Grabski and T. Scheffer. Sentence completion. In 27th Conference on Research and Development in Information Retrieval (SIGIR'04), pages 433--439, 2004. Google ScholarDigital Library
- M. Jakobsson. Autocompletion in full text transaction entry: a method for humanized input. In Conference on Human Factors in Computing Systems (CHI'86), pages 327--323, 1986. Google ScholarDigital Library
- D. Metzler, T. Strohman, H. Turtle, and W. B. Croft. Indri at TREC 2004: Terabyte track. In 13th Text Retrieval Conference (TREC'04), 2004.Google Scholar
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4):349--379, 1996. Google ScholarDigital Library
- G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan. Scalable browsing for large collections: A case study. In 5th Conference on Digital Libraries (DL'00), pages 215--223, 2000. Google ScholarDigital Library
- S. E. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In 4th Text Retrieval Conference (TREC'95), pages 73--96, 1995.Google Scholar
- T. Stocky, A. Faaborg, and H. Lieberman. A commonsense approach to predictive text entry. In Conference on Human Factors in Computing Systems (CHI'04), pages 1163--1166, 2004. Google ScholarDigital Library
- E. M. Voorhees. Query expansion using lexical-semantic relations. In 17th Conference on Research and Development in Information Retrieval (SIGIR'94), pages 171--180, 1994. Google ScholarDigital Library
- H. Williams and J. Zobel. Compressing integers for fast file access. Computer Journal, 42(3):193--201, 1999.Google ScholarDigital Library
- I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edition. Morgan Kaufmann, 1999. Google ScholarDigital Library
- J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453--490, 1998. Google ScholarDigital Library
Index Terms
- Type less, find more: fast autocompletion search with a succinct index
Recommendations
Output-sensitive autocompletion search
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the best such hits. ...
Autocompletion for Prefix-Abbreviated Input
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataQuery autocompletion (QAC) is an important interactive feature that assists users in formulating queries and saving keystrokes. Due to the convenience it brings to users, QAC has been adopted in many applications, including Web search engines, ...
Q2P: Discovering Query Templates via Autocompletion
We present Q2P, a system that discovers query templates from search engines via their query autocompletion services. Q2P is distinct from the existing works in that it does not rely on query logs of search engines that are typically not readily ...
Comments