research-article

Search result diversification in resource selection for federated search

Authors:
Dzung Hong

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Luo Si

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalJuly 2013Pages 613–622https://doi.org/10.1145/2484028.2484091

Published:28 July 2013Publication History

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 613–622

ABSTRACT

Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications.

This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining pages 5--14, 2009. Google ScholarDigital Library
J. Arguello, J. Callan, and F. Diaz. Classification-based resource selection. CIKM'09, pages 1277--1286, 2009. Google ScholarDigital Library
M. Baillie, M. Carman, and F. Crestani. A multi-collection latent topic model for federated search. Information Retrieval, 14(4):390--412, 2011. Google ScholarDigital Library
J. Callan. Distributed information retrieval. Advances in Information Retrieval, pages 127--150, 2000.Google Scholar
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documentsand producing summaries. In SIGIR'98, pages 335--336, 1998. Google ScholarDigital Library
B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topicretrieval. In CIKM'09, pages 1287--1296, 2009. Google ScholarDigital Library
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM'09, pages 621--630. ACM, 2009. Google ScholarDigital Library
H. Chen and D. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR'06, pages 429--436, 2006. Google ScholarDigital Library
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web Track. TREC, pages 1--9, Jan. 2009.Google Scholar
C. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 Web Track. TREC, pages 1--9, Jan. 2010.Google Scholar
C. Clarke, N. Craswell, I. Soboroff, and E. Voorhees. Overview of the TREC 2011 Web Track. pages 1--9, Jan. 2011.Google Scholar
C. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. Advances in Information Retrieval Theory, pages 188--199, 2009. Google ScholarDigital Library
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR'08, pages 659--666, 2008. Google ScholarDigital Library
N. Craswell. Methods for Distributed Information Retrieval. PhD thesis, The Australian National University, 2000.Google Scholar
F. Crestani and I. Markov. Distributed Information Retrieval and Applications. In Proceedings of ECIR, Jan. 2013. Google ScholarDigital Library
V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR'12, pages 65--74. ACM, 2012. Google ScholarDigital Library
N. Fuhr. Resource Discovery in Distributed Digital Libraries. In In Digital Libraries '99: Advanced Methods and Technologies, Digital Collections, 1999.Google Scholar
A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3):291--304, 2007.Google ScholarCross Ref
J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR'12, pages 851--860. ACM, 2012. Google ScholarDigital Library
D. Hong, L. Si, P. Bracke, M. Witt, and T. Juchcinski. A joint probabilistic classification model for resource selection. SIGIR'10, pages 98--105, 2010. Google ScholarDigital Library
A. Kulkarni and J. Callan. Document allocation policies for selective searching of distributed indexes. CIKM'10, pages 449--458, 2010. Google ScholarDigital Library
I. Markov, L. Azzopardi, and F. Crestani. Reducing the Uncertainty in Resource Selection. In Proceedings of ECIR, 2013. Google ScholarDigital Library
D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Information Processing and Management, 40(5):735--750, 2004. Google ScholarDigital Library
D. Nguyen, T. Demeester, D. Trieschnigg, and D. Hiemstra. Federated Search in the Wild. In CIKM '12, pages 1874--1878, 2012. Google ScholarDigital Library
R. L. Santos, C. Macdonald, and I. Ounis. Aggregated search result diversification. Advances in Information Retrieval Theory, pages 250--261, 2011. Google ScholarDigital Library
R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th international conference on World wide web, pages 881--890. ACM, 2010. Google ScholarDigital Library
M. Shokouhi. Central-rank-based collection selection in uncooperative distributed information retrieval. Advances in Information Retrieval, 2007. Google ScholarDigital Library
M. Shokouhi and L. Si. Federated Search. 2011.Google Scholar
M. Shokouhi and J. Zobel. Federated Text Retrieval From Uncooperative Overlapped Collections. SIGIR'07, pages 789--790, 2007. Google ScholarDigital Library
M. Shokouhi and J. Zobel. Robust result merging using sample-based score estimates. ACM Transactions on Information Systems (TOIS), 27(3):1--29, 2009. Google ScholarDigital Library
L. Si and J. Callan. A semisupervised learning method to merge search engine results. ACM Transactions on Information Systems (TOIS), 21(4):457--491, 2003. Google ScholarDigital Library
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. SIGIR'03, pages 298--305, 2003. Google ScholarDigital Library
P. Thomas and M. Shokouhi. Sushi: Scoring scaled samples for server selection. In SIGIR'09, pages 419--426. ACM, 2009. Google ScholarDigital Library
D. Vallet and P. Castells. Personalized diversification of search results. In SIGIR'12, pages 841--850. ACM, 2012. Google ScholarDigital Library
S. Vargas, P. Castells, and D. Vallet. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR'12, pages 75--84. ACM, 2012. Google ScholarDigital Library
J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In SIGIR'99, pages 254--261, 1999. Google ScholarDigital Library
B. Yuwono and D. L. Lee. Server ranking for distributed text retrieval systems on the internet. In Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), pages 41--50, 1997. Google ScholarDigital Library
C. X. Zhai, W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR'03, pages 10--17, 2003. Google ScholarDigital Library
K. Zhou, R. Cummins, M. Lalmas, and J. M. Jose. Evaluating aggregated search pages. In SIGIR'12, pages 115--124, 2012. Google ScholarDigital Library

Index Terms

Search result diversification in resource selection for federated search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Source selection of long tail sources for federated search in an uncooperative setting
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

The goal of federated search is to combine results from multiple knowledge bases into a single, aggregated result list with items typically ranging from textual documents to images. These knowledge bases are also termed sources, and the process of ...
Read More
Intent-based diversification of web search results: metrics and algorithms

We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...
Read More
An exploration of pattern-based subtopic modeling for search result diversification
JCDL '11: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain the same piece of relevant information, i.e., the same subtopic of a query. The goal of search result ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
July 2013
1188 pages
ISBN:9781450320344
DOI:10.1145/2484028
General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diversification
federated search
resource selection
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 474
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Search result diversification in resource selection for federated search

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Source selection of long tail sources for federated search in an uncooperative setting

Intent-based diversification of web search results: metrics and algorithms

An exploration of pattern-based subtopic modeling for search result diversification