ABSTRACT
Aggregator websites typically present documents in the form of representative clusters. In order for users to get a broader perspective, it is important to deliver a diversified set of representative documents in those clusters. One approach to diversification is to maximize the average dissimilarity among documents. Another way to capture diversity is to avoid showing several documents from the same category (e.g. from the same news channel). We combine the above two diversification concepts by modeling the latter approach as a (partition) matroid constraint, and study diversity maximization problems under matroid constraints. We present the first constant-factor approximation algorithm for this problem, using a new technique. Our local search 0.5-approximation algorithm is also the first constant-factor approximation for the max-dispersion problem under matroid constraints. Our combinatorial proof technique for maximizing diversity under matroid constraints uses the existence of a family of Latin squares which may also be of independent interest.
In order to apply these diversity maximization algorithms in the context of aggregator websites and as a preprocessing step for our diversity maximization tool, we develop greedy clustering algorithms that maximize weighted coverage of a predefined set of topics. Our algorithms are based on computing a set of cluster centers, where clusters are formed around them. We show the better performance of our algorithms for diversity and coverage maximization by running experiments on real (Twitter) and synthetic data in the context of real-time search over micro-posts. Finally we perform a user study validating our algorithms and diversity metrics.
- A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD Conference, pages 781--792, 2011. Google ScholarDigital Library
- A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD Conference, pages 781--792, 2011. Google ScholarDigital Library
- S. Bhattacharya, S. Gollapudi, and K. Munagala. Consideration set generation in commerce search. In WWW, pages 317--326, 2011. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google ScholarDigital Library
- M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380--388, 2002. Google ScholarDigital Library
- M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA, pages 642--651, 2001. Google ScholarDigital Library
- Z. Chen and T. Li. Addressing Diverse User Preferences in SQL-Query-Result Navigation. In SIGMOD, 2007. Google ScholarDigital Library
- Z. Cheng, B. Gao, and T.-Y. Liu. Actively predicting diverse search intent from user browsing behaviors. In WWW, pages 221--230, 2010. Google ScholarDigital Library
- K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, pages 289--298, 2009. Google ScholarDigital Library
- U. Feige. A threshold of ln for approximating set cover. J. ACM, 45(4):634--652, 1998. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
- R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133--137, 1997. Google ScholarDigital Library
- P. Heppner, B. Wampold, and D. Kivlighan. Research design in counseling. Brooks/Cole Pub Co, 2008.Google Scholar
- A. J. W. Hilton. On double diagonal and cross latin squares. Journal of London Mathematical Sociecy, 2--6:679--689, 1971.Google Scholar
- L. S. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In Proceedings of the 17th international conference on World Wide Web, pages 297--306. ACM, 2008. Google ScholarDigital Library
- J. A. Konstan. Introduction to recommender systems. In SIGIR, 2007.Google Scholar
- G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The Essence of a Query Answer. In ICDE, 2006. Google ScholarDigital Library
- K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In WWW, 2004. Google ScholarDigital Library
- W. Mason and S. Suri. Conducting behavioral research on amazon's mechanical turk. Behavior Research Methods, pages 1--23, 2010.Google Scholar
- S. McNee, J. Riedl, and J. A. Konstan. Being Accurate Is Not Enough: How Accuracy Metrics Have Hurt Recommender Systems. In CHI, 2006. Google ScholarDigital Library
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions, 1978.Google ScholarDigital Library
- D. Panigrahi, A. Das Sarma, G. Aggarwal, and A. Tomkins. Online selection of diverse results. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 263--272. ACM, 2012. Google ScholarDigital Library
- L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. Proceedings of the VLDB Endowment, 5(11):1124--1135, 2012. Google ScholarDigital Library
- F. Radlinski and S. T. Dumais. Improving Personalized Web Search using Result Diversification. In SIGIR, 2006. Google ScholarDigital Library
- D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790, 2010. Google ScholarDigital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010. Google ScholarDigital Library
- A. Schrijver. Combinatorial Optimization. Springer-Verlag, Berlin, 2003.Google Scholar
- M. J. Streeter, D. Golovin, and A. Krause. Online learning of assignments. In NIPS, pages 1794--1802, 2009.Google Scholar
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient Online Computation of Diverse Query Results. In ICDE, 2008. Google ScholarDigital Library
- M. R. Vieira, H. L. Razente, M. C. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J. Tsotras. On query result diversification. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 1163--1174. IEEE, 2011. Google ScholarDigital Library
- D. Xin, H. Cheng, X. Yan, and J. Han. Extracting Redundancy-Aware Top-k Patterns. In SIGKDD 2006, 2006. Google ScholarDigital Library
- C. Yu, L. Lakshmanan, and S. Amer-Yahia. Recommendation diversification using explanations. In ICDE, 2009. Google ScholarDigital Library
Index Terms
- Diversity maximization under matroid constraints
Recommendations
A General Coreset-Based Approach to Diversity Maximization under Matroid Constraints
Special Issue on KDD 2018, Regular Papers and Survey PaperDiversity maximization is a fundamental problem in web search and data mining. For a given dataset S of n elements, the problem requires to determine a subset of S containing k≪n “representatives” which maximize some diversity function expressed in ...
Fast Coreset-based Diversity Maximization under Matroid Constraints
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data MiningMax-sum diversity is a fundamental primitive for web search and data mining. For a given set S of n elements, it returns a subset of k«l n representatives maximizing the sum of their pairwise distances, where distance models dissimilarity. An important ...
Non-monotone submodular maximization under matroid and knapsack constraints
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computingSubmodular function maximization is a central problem in combinatorial optimization, generalizing many important problems including Max Cut in directed/undirected graphs and in hypergraphs, certain constraint satisfaction problems, maximum entropy ...
Comments