Article

Exploiting underrepresented query aspects for automatic query expansion

Authors:
Daniel Wayne Crabtree

Victoria University of Wellington

Victoria University of Wellington
View Profile

,
Peter Andreae

Victoria University of Wellington

Victoria University of Wellington
View Profile

,
Xiaoying Gao

Victoria University of Wellington

Victoria University of Wellington
View Profile

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2007Pages 191–200https://doi.org/10.1145/1281192.1281216

Published:12 August 2007Publication History

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 191–200

ABSTRACT

Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely to be more relevant than those that only represent some aspects. Current web search engines often produce result sets whose top ranking documents represent only a subset of the query aspects. By expanding the query using the right keywords, the search engine can find documents that represent more query aspects and performance improves. This paper describes AbraQ, an approach for automatically finding the right keywords to expand the query. AbraQ identifies the aspects in the query, identifies which aspects are underrepresented in the result set of the original query, and finally, for any particularly underrepresented aspect, identifies keywords that would enhance that aspect's representation and automatically expands the query using the best one. The paper presents experiments that show AbraQ significantly increases the precision of hard queries, whereas traditional automatic query expansion techniques have not improved precision. AbraQ also compared favourably against a range of interactive query expansion techniques that require user involvement including clustering, web-log analysis, relevance feedback, and pseudo relevance feedback.

References

P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.Google Scholar
C. Buckley. Why current ir engines fail. In ACM SIGIR, pages 584--585, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3, 1994.Google Scholar
D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In ACM SIGIR, pages 390--397, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
R. L. Cilibrasi and P. M. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, March 2007. Google ScholarDigital Library
K. Collins-Thompson and J. Callan. Query expansion using random walk models. In CIKM, pages 704--711, 2005. Google ScholarDigital Library
D. Crabtree, P. Andreae, and X. Gao. Query directed web page clustering. In Web Intelligence, pages 202--210, 2006. Google ScholarDigital Library
B. B. Croft, H. R. Turtle, and D. D. Lewis. The use of phrases and structured queries in information retrieval. In ACM SIGIR, pages 32--45. ACM Press, 1991. Google ScholarDigital Library
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW, pages 325--332, 2002. Google ScholarDigital Library
C. de Loupy and P. Bellot. Evaluation of document retrieval systems and query difficulty. In Using Evaluation within HLT Programs: Results and Trends, pages 34--40, 2000.Google Scholar
D. Harman. Relevance feedback and other query modification techniques, chapter 11, pages 241--263. Englewood Cliffs: Prentice Hall, 1992. Google ScholarDigital Library
M.-H. Hsu, M.-F. Tsai, and H.-H. Chen. Query expansion with conceptnet and wordnet: An intrinsic comparison. In AIRS, pages 1--13, 2006. Google ScholarDigital Library
B. J. Jansen, A. Spink, and J. O. Pedersen. A temporal comparison of altavista web searching. JASIST, 56(6):559--570, 2005. Google ScholarDigital Library
C. S. G. Khoo and D. C. C. Poo. An expert system approach to online catalog subject searching. Information Processing and Management, 30(2):223--238, 1994. Google ScholarDigital Library
M. Magennis and C. J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In ACM SIGIR, pages 324--332, 1997. Google ScholarDigital Library
Mamma.com: www.mamma.com, 2007.Google Scholar
F. A. D. Neves, E. A. Fox, and X. Yu. Connecting topics in document collections with stepping stones and pathways. In CIKM, pages 91--98, 2005. Google ScholarDigital Library
S. Osiński, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Intelligent Information Processing and Web Mining Conference, Advances in Soft Computing, pages 359--368, Zakopane, Poland, 2004. Springer.Google ScholarCross Ref
S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.Google ScholarCross Ref
J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.Google Scholar
I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 19(2):95--145, June 2003. Google ScholarDigital Library
I. Soboroff. On evaluating web search with very few relevant documents. In SIGIR, pages 530--531, 2004. Google ScholarDigital Library
C. J. van Rijsbergen. Information retrieval. Butterworths, 2nd edition edition, 1979. Google ScholarDigital Library
B. Vélez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In ACM SIGIR, pages 6--15, 1997. Google ScholarDigital Library
V. Vinay, K. R. Wood, N. Milic-Frayling, and I. J. Cox. Comparing relevance feedback algorithms for web search. In WWW, pages 1052--1053, 2005. Google ScholarDigital Library
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In ACM SIGIR, pages 4--11, 1996. Google ScholarDigital Library
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In ACM SIGIR, pages 512--519, 2005. Google ScholarDigital Library
X. Yu, F. A. D. Neves, and E. A. Fox. Hard queries can be addressed with query splitting plus stepping stones and pathways. IEEE Data Engineering Bulletin, 28(4):29--38, 2005.Google Scholar
O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Research and Development in Information Retrieval, pages 46--54, 1998. Google ScholarDigital Library
J. Zhang, L. Sun, Y. Lv, and W. Zhang. Relevance feedback by exploring the different feedback source and collection structure. In Text REtrieval Conference (TREC), 2005.Google Scholar

Index Terms

Exploiting underrepresented query aspects for automatic query expansion
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Query routing for Web search engines: architecture and experiments
Abstract
General-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...
Read More
Query expansion using path-constrained random walks
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

This paper exploits Web search logs for query expansion (QE) by presenting a new QE method based on path-constrained random walks (PCRW), where the search logs are represented as a labeled, directed graph, and the probability of picking an expansion ...
Read More
Evaluation of phrasal query suggestions
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

This paper evaluates the uptake and efficacy of a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator <scp>NEAR</scp> ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2007
1080 pages
ISBN:9781595936097
DOI:10.1145/1281192
General Chair:
Pavel Berkhin
Yahoo!, USA
,
Program Chairs:
Rich Caruana
Cornell University, USA
,
Xindong Wu
University of Vermont, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
aspect coverage
global document analysis
query expansion
web search
Qualifiers
- Article
Conference

Acceptance Rates
KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 1,043
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting underrepresented query aspects for automatic query expansion

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Query routing for Web search engines: architecture and experiments

Query expansion using path-constrained random walks

Evaluation of phrasal query suggestions