skip to main content
10.1145/2594538.2594560acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article
Open Access

Composable core-sets for diversity and coverage maximization

Published:18 June 2014Publication History

ABSTRACT

In this paper we consider efficient construction of "composable core-sets" for basic diversity and coverage maximization problems. A core-set for a point-set in a metric space is a subset of the point-set with the property that an approximate solution to the whole point-set can be obtained given the core-set alone. A composable core-set has the property that for a collection of sets, the approximate solution to the union of the sets in the collection can be obtained given the union of the composable core-sets for the point sets in the collection. Using composable core-sets one can obtain efficient solutions to a wide variety of massive data processing applications, including nearest neighbor search, streaming algorithms and map-reduce computation.

Our main results are algorithms for constructing composable core-sets for several notions of "diversity objective functions", a topic that attracted a significant amount of research over the last few years. The composable core-sets we construct are small and accurate: their approximation factor almost matches that of the best "off-line" algorithms for the relevant optimization problems (up to a constant factor). Moreover, we also show applications of our results to diverse nearest neighbor search, streaming algorithms and map-reduce computation. Finally, we show that for an alternative notion of diversity maximization based on the maximum coverage problem small composable core-sets do not exist.

References

  1. S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi. Real-time recommendation of diverse related articles. In WWW, pages 1--12, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abbar, S. Amer-Yahia, P. Indyk, S. Mahabadi, and K. R. Varadarajan. Diverse near neighbor problem. In SoCG, pages 207--214, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In KDD, pages 32--40, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In Proceedings of the 31st symposium on Principles of Database Systems, pages 23--34. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606--635, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Anagnostopoulos, A. Z. Broder, and D. Carmel. Sampling Search-Engine Results. In WWW, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Angel and N. Koudas. Efficient diversity-aware search. In Proceedings of the 2011 international conference on Management of data, SIGMOD '11, pages 781--792, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed clustering on graphs. In NIPS, page to appear, 2013.Google ScholarGoogle Scholar
  9. S. Bhattacharya, S. Gollapudi, and K. Munagala. Consideration set generation in commerce search. In WWW, pages 317--326, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS, pages 155--166, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Charikar, C. Chekuri, T. Feder and R. Motwani.Incremental Clustering and Dynamic Information Retrieval. SIAM J. Comput. 33(6), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Chandra and M. M. Halldórsson. Approximation algorithms for dispersion problems. Journal of algorithms, 38(2):438--465, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Chen and T. Li. Addressing Diverse User Preferences in SQL-Query-Result Navigation. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pages 231--240, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Drosou and E. Pitoura. Search result diversification. SIGMOD Record, pages 41--47, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. U. Feige. A threshold of ln for approximating set cover. J. ACM, 45(4):634--652, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Gollapudi and A. Sharma. An axiomatic framework for result diversification. In WWW, pages 381--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, pages 293--306, 1985.Google ScholarGoogle Scholar
  22. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. STOC, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Guha, Tight results for clustering and summarizing data streams. ICDT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133--137, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. K. Iyer and J. A. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems, pages 2436--2444, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. In PAKDD, pages 404--413, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. W. Jones, J. A. Baerentzen, and M. Sramek. 3d distance fields: A survey of techniques and applications. Visualization and Computer Graphics, IEEE Transactions on, 12(4):581--599, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms in mapreduce and streaming. In SPAA, pages 1--10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Lin, J. Bilmes, and S. Xie. Graph-based submodular selection for extractive summarization.Google ScholarGoogle Scholar
  33. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. Radlinski and S. T. Dumais. Improving Personalized Web Search using Result Diversification. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Facility dispersion problems: Heuristics and special cases. Algorithms and Data Structures, pages 355--366, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  37. R. Sibson. Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1):30--34, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  38. M. J. Welch, J. Cho, and C. Olston. Search result diversity for informational queries. In WWW, pages 237--246, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. Xin, H. Cheng, X. Yan, and J. Han. Extracting Redundancy-Aware Top-k Patterns. In SIGKDD 2006, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Yu, L. V. S. Lakshmanan, and S. Amer-Yahia. Recommendation diversification using explanations. In ICDE, pages 1299--1302, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Composable core-sets for diversity and coverage maximization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '14: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
          June 2014
          300 pages
          ISBN:9781450323758
          DOI:10.1145/2594538
          • General Chair:
          • Richard Hull,
          • Program Chair:
          • Martin Grohe

          Copyright © 2014 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2014

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          PODS '14 Paper Acceptance Rate22of67submissions,33%Overall Acceptance Rate642of2,707submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader