skip to main content
10.1145/2487575.2487636acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Diversity maximization under matroid constraints

Published:11 August 2013Publication History

ABSTRACT

Aggregator websites typically present documents in the form of representative clusters. In order for users to get a broader perspective, it is important to deliver a diversified set of representative documents in those clusters. One approach to diversification is to maximize the average dissimilarity among documents. Another way to capture diversity is to avoid showing several documents from the same category (e.g. from the same news channel). We combine the above two diversification concepts by modeling the latter approach as a (partition) matroid constraint, and study diversity maximization problems under matroid constraints. We present the first constant-factor approximation algorithm for this problem, using a new technique. Our local search 0.5-approximation algorithm is also the first constant-factor approximation for the max-dispersion problem under matroid constraints. Our combinatorial proof technique for maximizing diversity under matroid constraints uses the existence of a family of Latin squares which may also be of independent interest.

In order to apply these diversity maximization algorithms in the context of aggregator websites and as a preprocessing step for our diversity maximization tool, we develop greedy clustering algorithms that maximize weighted coverage of a predefined set of topics. Our algorithms are based on computing a set of cluster centers, where clusters are formed around them. We show the better performance of our algorithms for diversity and coverage maximization by running experiments on real (Twitter) and synthetic data in the context of real-time search over micro-posts. Finally we perform a user study validating our algorithms and diversity metrics.

References

  1. A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD Conference, pages 781--792, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD Conference, pages 781--792, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Bhattacharya, S. Gollapudi, and K. Munagala. Consideration set generation in commerce search. In WWW, pages 317--326, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380--388, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA, pages 642--651, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Chen and T. Li. Addressing Diverse User Preferences in SQL-Query-Result Navigation. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Cheng, B. Gao, and T.-Y. Liu. Actively predicting diverse search intent from user browsing behaviors. In WWW, pages 221--230, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, pages 289--298, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Feige. A threshold of ln for approximating set cover. J. ACM, 45(4):634--652, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133--137, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Heppner, B. Wampold, and D. Kivlighan. Research design in counseling. Brooks/Cole Pub Co, 2008.Google ScholarGoogle Scholar
  14. A. J. W. Hilton. On double diagonal and cross latin squares. Journal of London Mathematical Sociecy, 2--6:679--689, 1971.Google ScholarGoogle Scholar
  15. L. S. Kennedy and M. Naaman. Generating diverse and representative image search results for landmarks. In Proceedings of the 17th international conference on World Wide Web, pages 297--306. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. A. Konstan. Introduction to recommender systems. In SIGIR, 2007.Google ScholarGoogle Scholar
  17. G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The Essence of a Query Answer. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Mason and S. Suri. Conducting behavioral research on amazon's mechanical turk. Behavior Research Methods, pages 1--23, 2010.Google ScholarGoogle Scholar
  20. S. McNee, J. Riedl, and J. A. Konstan. Being Accurate Is Not Enough: How Accuracy Metrics Have Hurt Recommender Systems. In CHI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Panigrahi, A. Das Sarma, G. Aggarwal, and A. Tomkins. Online selection of diverse results. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 263--272. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. Proceedings of the VLDB Endowment, 5(11):1124--1135, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Radlinski and S. T. Dumais. Improving Personalized Web Search using Result Diversification. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Schrijver. Combinatorial Optimization. Springer-Verlag, Berlin, 2003.Google ScholarGoogle Scholar
  28. M. J. Streeter, D. Golovin, and A. Krause. Online learning of assignments. In NIPS, pages 1794--1802, 2009.Google ScholarGoogle Scholar
  29. E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient Online Computation of Diverse Query Results. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. R. Vieira, H. L. Razente, M. C. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina, and V. J. Tsotras. On query result diversification. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 1163--1174. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Xin, H. Cheng, X. Yan, and J. Han. Extracting Redundancy-Aware Top-k Patterns. In SIGKDD 2006, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Yu, L. Lakshmanan, and S. Amer-Yahia. Recommendation diversification using explanations. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Diversity maximization under matroid constraints

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader