ABSTRACT
Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely to be more relevant than those that only represent some aspects. Current web search engines often produce result sets whose top ranking documents represent only a subset of the query aspects. By expanding the query using the right keywords, the search engine can find documents that represent more query aspects and performance improves. This paper describes AbraQ, an approach for automatically finding the right keywords to expand the query. AbraQ identifies the aspects in the query, identifies which aspects are underrepresented in the result set of the original query, and finally, for any particularly underrepresented aspect, identifies keywords that would enhance that aspect's representation and automatically expands the query using the best one. The paper presents experiments that show AbraQ significantly increases the precision of hard queries, whereas traditional automatic query expansion techniques have not improved precision. AbraQ also compared favourably against a range of interactive query expansion techniques that require user involvement including clustering, web-log analysis, relevance feedback, and pseudo relevance feedback.
- P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.Google Scholar
- C. Buckley. Why current ir engines fail. In ACM SIGIR, pages 584--585, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3, 1994.Google Scholar
- D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In ACM SIGIR, pages 390--397, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- R. L. Cilibrasi and P. M. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, March 2007. Google ScholarDigital Library
- K. Collins-Thompson and J. Callan. Query expansion using random walk models. In CIKM, pages 704--711, 2005. Google ScholarDigital Library
- D. Crabtree, P. Andreae, and X. Gao. Query directed web page clustering. In Web Intelligence, pages 202--210, 2006. Google ScholarDigital Library
- B. B. Croft, H. R. Turtle, and D. D. Lewis. The use of phrases and structured queries in information retrieval. In ACM SIGIR, pages 32--45. ACM Press, 1991. Google ScholarDigital Library
- H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW, pages 325--332, 2002. Google ScholarDigital Library
- C. de Loupy and P. Bellot. Evaluation of document retrieval systems and query difficulty. In Using Evaluation within HLT Programs: Results and Trends, pages 34--40, 2000.Google Scholar
- D. Harman. Relevance feedback and other query modification techniques, chapter 11, pages 241--263. Englewood Cliffs: Prentice Hall, 1992. Google ScholarDigital Library
- M.-H. Hsu, M.-F. Tsai, and H.-H. Chen. Query expansion with conceptnet and wordnet: An intrinsic comparison. In AIRS, pages 1--13, 2006. Google ScholarDigital Library
- B. J. Jansen, A. Spink, and J. O. Pedersen. A temporal comparison of altavista web searching. JASIST, 56(6):559--570, 2005. Google ScholarDigital Library
- C. S. G. Khoo and D. C. C. Poo. An expert system approach to online catalog subject searching. Information Processing and Management, 30(2):223--238, 1994. Google ScholarDigital Library
- M. Magennis and C. J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In ACM SIGIR, pages 324--332, 1997. Google ScholarDigital Library
- Mamma.com: www.mamma.com, 2007.Google Scholar
- F. A. D. Neves, E. A. Fox, and X. Yu. Connecting topics in document collections with stepping stones and pathways. In CIKM, pages 91--98, 2005. Google ScholarDigital Library
- S. Osiński, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Intelligent Information Processing and Web Mining Conference, Advances in Soft Computing, pages 359--368, Zakopane, Poland, 2004. Springer.Google ScholarCross Ref
- S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.Google ScholarCross Ref
- J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.Google Scholar
- I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 19(2):95--145, June 2003. Google ScholarDigital Library
- I. Soboroff. On evaluating web search with very few relevant documents. In SIGIR, pages 530--531, 2004. Google ScholarDigital Library
- C. J. van Rijsbergen. Information retrieval. Butterworths, 2nd edition edition, 1979. Google ScholarDigital Library
- B. Vélez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In ACM SIGIR, pages 6--15, 1997. Google ScholarDigital Library
- V. Vinay, K. R. Wood, N. Milic-Frayling, and I. J. Cox. Comparing relevance feedback algorithms for web search. In WWW, pages 1052--1053, 2005. Google ScholarDigital Library
- J. Xu and W. B. Croft. Query expansion using local and global document analysis. In ACM SIGIR, pages 4--11, 1996. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In ACM SIGIR, pages 512--519, 2005. Google ScholarDigital Library
- X. Yu, F. A. D. Neves, and E. A. Fox. Hard queries can be addressed with query splitting plus stepping stones and pathways. IEEE Data Engineering Bulletin, 28(4):29--38, 2005.Google Scholar
- O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Research and Development in Information Retrieval, pages 46--54, 1998. Google ScholarDigital Library
- J. Zhang, L. Sun, Y. Lv, and W. Zhang. Relevance feedback by exploring the different feedback source and collection structure. In Text REtrieval Conference (TREC), 2005.Google Scholar
Index Terms
- Exploiting underrepresented query aspects for automatic query expansion
Recommendations
Query routing for Web search engines: architecture and experiments
AbstractGeneral-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...
Query expansion using path-constrained random walks
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalThis paper exploits Web search logs for query expansion (QE) by presenting a new QE method based on path-constrained random walks (PCRW), where the search logs are represented as a labeled, directed graph, and the probability of picking an expansion ...
Evaluation of phrasal query suggestions
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementThis paper evaluates the uptake and efficacy of a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator <scp>NEAR</scp> ...
Comments