skip to main content
10.1145/1281192.1281216acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Exploiting underrepresented query aspects for automatic query expansion

Published:12 August 2007Publication History

ABSTRACT

Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely to be more relevant than those that only represent some aspects. Current web search engines often produce result sets whose top ranking documents represent only a subset of the query aspects. By expanding the query using the right keywords, the search engine can find documents that represent more query aspects and performance improves. This paper describes AbraQ, an approach for automatically finding the right keywords to expand the query. AbraQ identifies the aspects in the query, identifies which aspects are underrepresented in the result set of the original query, and finally, for any particularly underrepresented aspect, identifies keywords that would enhance that aspect's representation and automatically expands the query using the best one. The paper presents experiments that show AbraQ significantly increases the precision of hard queries, whereas traditional automatic query expansion techniques have not improved precision. AbraQ also compared favourably against a range of interactive query expansion techniques that require user involvement including clustering, web-log analysis, relevance feedback, and pseudo relevance feedback.

References

  1. P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.Google ScholarGoogle Scholar
  2. C. Buckley. Why current ir engines fail. In ACM SIGIR, pages 584--585, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3, 1994.Google ScholarGoogle Scholar
  4. D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In ACM SIGIR, pages 390--397, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. L. Cilibrasi and P. M. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Collins-Thompson and J. Callan. Query expansion using random walk models. In CIKM, pages 704--711, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Crabtree, P. Andreae, and X. Gao. Query directed web page clustering. In Web Intelligence, pages 202--210, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. B. Croft, H. R. Turtle, and D. D. Lewis. The use of phrases and structured queries in information retrieval. In ACM SIGIR, pages 32--45. ACM Press, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW, pages 325--332, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. de Loupy and P. Bellot. Evaluation of document retrieval systems and query difficulty. In Using Evaluation within HLT Programs: Results and Trends, pages 34--40, 2000.Google ScholarGoogle Scholar
  11. D. Harman. Relevance feedback and other query modification techniques, chapter 11, pages 241--263. Englewood Cliffs: Prentice Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M.-H. Hsu, M.-F. Tsai, and H.-H. Chen. Query expansion with conceptnet and wordnet: An intrinsic comparison. In AIRS, pages 1--13, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. J. Jansen, A. Spink, and J. O. Pedersen. A temporal comparison of altavista web searching. JASIST, 56(6):559--570, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. S. G. Khoo and D. C. C. Poo. An expert system approach to online catalog subject searching. Information Processing and Management, 30(2):223--238, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Magennis and C. J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In ACM SIGIR, pages 324--332, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mamma.com: www.mamma.com, 2007.Google ScholarGoogle Scholar
  17. F. A. D. Neves, E. A. Fox, and X. Yu. Connecting topics in document collections with stepping stones and pathways. In CIKM, pages 91--98, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Osiński, J. Stefanowski, and D. Weiss. Lingo: Search results clustering algorithm based on singular value decomposition. In Intelligent Information Processing and Web Mining Conference, Advances in Soft Computing, pages 359--368, Zakopane, Poland, 2004. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.Google ScholarGoogle Scholar
  21. I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 19(2):95--145, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Soboroff. On evaluating web search with very few relevant documents. In SIGIR, pages 530--531, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. J. van Rijsbergen. Information retrieval. Butterworths, 2nd edition edition, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Vélez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In ACM SIGIR, pages 6--15, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Vinay, K. R. Wood, N. Milic-Frayling, and I. J. Cox. Comparing relevance feedback algorithms for web search. In WWW, pages 1052--1053, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In ACM SIGIR, pages 4--11, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In ACM SIGIR, pages 512--519, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yu, F. A. D. Neves, and E. A. Fox. Hard queries can be addressed with query splitting plus stepping stones and pathways. IEEE Data Engineering Bulletin, 28(4):29--38, 2005.Google ScholarGoogle Scholar
  29. O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Research and Development in Information Retrieval, pages 46--54, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Zhang, L. Sun, Y. Lv, and W. Zhang. Relevance feedback by exploring the different feedback source and collection structure. In Text REtrieval Conference (TREC), 2005.Google ScholarGoogle Scholar

Index Terms

  1. Exploiting underrepresented query aspects for automatic query expansion

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader