skip to main content
10.1145/2484028.2484091acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Search result diversification in resource selection for federated search

Published:28 July 2013Publication History

ABSTRACT

Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications.

This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Arguello, J. Callan, and F. Diaz. Classification-based resource selection. CIKM'09, pages 1277--1286, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Baillie, M. Carman, and F. Crestani. A multi-collection latent topic model for federated search. Information Retrieval, 14(4):390--412, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Callan. Distributed information retrieval. Advances in Information Retrieval, pages 127--150, 2000.Google ScholarGoogle Scholar
  5. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documentsand producing summaries. In SIGIR'98, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topicretrieval. In CIKM'09, pages 1287--1296, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM'09, pages 621--630. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Chen and D. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR'06, pages 429--436, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. TREC, pages 1--9, Jan. 2009.Google ScholarGoogle Scholar
  10. C. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 Web Track. TREC, pages 1--9, Jan. 2010.Google ScholarGoogle Scholar
  11. C. Clarke, N. Craswell, I. Soboroff, and E. Voorhees. Overview of the TREC 2011 Web Track. pages 1--9, Jan. 2011.Google ScholarGoogle Scholar
  12. C. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. Advances in Information Retrieval Theory, pages 188--199, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR'08, pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Craswell. Methods for Distributed Information Retrieval. PhD thesis, The Australian National University, 2000.Google ScholarGoogle Scholar
  15. F. Crestani and I. Markov. Distributed Information Retrieval and Applications. In Proceedings of ECIR, Jan. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR'12, pages 65--74. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Fuhr. Resource Discovery in Distributed Digital Libraries. In In Digital Libraries '99: Advanced Methods and Technologies, Digital Collections, 1999.Google ScholarGoogle Scholar
  18. A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3):291--304, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR'12, pages 851--860. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Hong, L. Si, P. Bracke, M. Witt, and T. Juchcinski. A joint probabilistic classification model for resource selection. SIGIR'10, pages 98--105, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Kulkarni and J. Callan. Document allocation policies for selective searching of distributed indexes. CIKM'10, pages 449--458, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Markov, L. Azzopardi, and F. Crestani. Reducing the Uncertainty in Resource Selection. In Proceedings of ECIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Information Processing and Management, 40(5):735--750, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Nguyen, T. Demeester, D. Trieschnigg, and D. Hiemstra. Federated Search in the Wild. In CIKM '12, pages 1874--1878, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. L. Santos, C. Macdonald, and I. Ounis. Aggregated search result diversification. Advances in Information Retrieval Theory, pages 250--261, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th international conference on World wide web, pages 881--890. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Shokouhi. Central-rank-based collection selection in uncooperative distributed information retrieval. Advances in Information Retrieval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Shokouhi and L. Si. Federated Search. 2011.Google ScholarGoogle Scholar
  29. M. Shokouhi and J. Zobel. Federated Text Retrieval From Uncooperative Overlapped Collections. SIGIR'07, pages 789--790, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Shokouhi and J. Zobel. Robust result merging using sample-based score estimates. ACM Transactions on Information Systems (TOIS), 27(3):1--29, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Si and J. Callan. A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems (TOIS), 21(4):457--491, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Si and J. Callan. Relevant document distribution estimation method for resource selection. SIGIR'03, pages 298--305, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Thomas and M. Shokouhi. Sushi: Scoring scaled samples for server selection. In SIGIR'09, pages 419--426. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Vallet and P. Castells. Personalized diversification of search results. In SIGIR'12, pages 841--850. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Vargas, P. Castells, and D. Vallet. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR'12, pages 75--84. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In SIGIR'99, pages 254--261, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Yuwono and D. L. Lee. Server ranking for distributed text retrieval systems on the internet. In Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), pages 41--50, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. X. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR'03, pages 10--17, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Zhou, R. Cummins, M. Lalmas, and J. M. Jose. Evaluating aggregated search pages. In SIGIR'12, pages 115--124, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Search result diversification in resource selection for federated search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
      July 2013
      1188 pages
      ISBN:9781450320344
      DOI:10.1145/2484028

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 July 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader