ABSTRACT
In this paper we consider efficient construction of "composable core-sets" for basic diversity and coverage maximization problems. A core-set for a point-set in a metric space is a subset of the point-set with the property that an approximate solution to the whole point-set can be obtained given the core-set alone. A composable core-set has the property that for a collection of sets, the approximate solution to the union of the sets in the collection can be obtained given the union of the composable core-sets for the point sets in the collection. Using composable core-sets one can obtain efficient solutions to a wide variety of massive data processing applications, including nearest neighbor search, streaming algorithms and map-reduce computation.
Our main results are algorithms for constructing composable core-sets for several notions of "diversity objective functions", a topic that attracted a significant amount of research over the last few years. The composable core-sets we construct are small and accurate: their approximation factor almost matches that of the best "off-line" algorithms for the relevant optimization problems (up to a constant factor). Moreover, we also show applications of our results to diverse nearest neighbor search, streaming algorithms and map-reduce computation. Finally, we show that for an alternative notion of diversity maximization based on the maximum coverage problem small composable core-sets do not exist.
- S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi. Real-time recommendation of diverse related articles. In WWW, pages 1--12, 2013. Google ScholarDigital Library
- S. Abbar, S. Amer-Yahia, P. Indyk, S. Mahabadi, and K. R. Varadarajan. Diverse near neighbor problem. In SoCG, pages 207--214, 2013. Google ScholarDigital Library
- Z. Abbassi, V. S. Mirrokni, and M. Thakur. Diversity maximization under matroid constraints. In KDD, pages 32--40, 2013. Google ScholarDigital Library
- P. K. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In Proceedings of the 31st symposium on Principles of Database Systems, pages 23--34. ACM, 2012. Google ScholarDigital Library
- P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points. Journal of the ACM (JACM), 51(4):606--635, 2004. Google ScholarDigital Library
- A. Anagnostopoulos, A. Z. Broder, and D. Carmel. Sampling Search-Engine Results. In WWW, 2006. Google ScholarDigital Library
- A. Angel and N. Koudas. Efficient diversity-aware search. In Proceedings of the 2011 international conference on Management of data, SIGMOD '11, pages 781--792, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- M.-F. Balcan, S. Ehrlich, and Y. Liang. Distributed clustering on graphs. In NIPS, page to appear, 2013.Google Scholar
- S. Bhattacharya, S. Gollapudi, and K. Munagala. Consideration set generation in commerce search. In WWW, pages 317--326, 2011. Google ScholarDigital Library
- A. Borodin, H. C. Lee, and Y. Ye. Max-sum diversification, monotone submodular functions and dynamic updates. In PODS, pages 155--166, 2012. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. Google ScholarDigital Library
- M. Charikar, C. Chekuri, T. Feder and R. Motwani.Incremental Clustering and Dynamic Information Retrieval. SIAM J. Comput. 33(6), 2004. Google ScholarDigital Library
- B. Chandra and M. M. Halldórsson. Approximation algorithms for dispersion problems. Journal of algorithms, 38(2):438--465, 2001. Google ScholarDigital Library
- Z. Chen and T. Li. Addressing Diverse User Preferences in SQL-Query-Result Navigation. In SIGMOD, 2007. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In WWW, pages 231--240, 2010. Google ScholarDigital Library
- G. Cormode, H. J. Karloff, and A. Wirth. Set cover algorithms for very large datasets. In CIKM, pages 479--488, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
- M. Drosou and E. Pitoura. Search result diversification. SIGMOD Record, pages 41--47, 2010. Google ScholarDigital Library
- U. Feige. A threshold of ln for approximating set cover. J. ACM, 45(4):634--652, 1998. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. An axiomatic framework for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
- T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, pages 293--306, 1985.Google Scholar
- S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. STOC, 2001.Google ScholarDigital Library
- S. Guha, Tight results for clustering and summarizing data streams. ICDT, 2009. Google ScholarDigital Library
- R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133--137, 1997. Google ScholarDigital Library
- R. K. Iyer and J. A. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems, pages 2436--2444, 2013.Google ScholarDigital Library
- A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. In PAKDD, pages 404--413, 2004.Google ScholarCross Ref
- M. W. Jones, J. A. Baerentzen, and M. Sramek. 3d distance fields: A survey of techniques and applications. Visualization and Computer Graphics, IEEE Transactions on, 12(4):581--599, 2006. Google ScholarDigital Library
- H. J. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938--948, 2010. Google ScholarDigital Library
- R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms in mapreduce and streaming. In SPAA, pages 1--10, 2013. Google ScholarDigital Library
- K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In WWW, 2004. Google ScholarDigital Library
- S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In SPAA, pages 85--94, 2011. Google ScholarDigital Library
- H. Lin, J. Bilmes, and S. Xie. Graph-based submodular selection for extractive summarization.Google Scholar
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions, 1978.Google ScholarDigital Library
- F. Radlinski and S. T. Dumais. Improving Personalized Web Search using Result Diversification. In SIGIR, 2006. Google ScholarDigital Library
- D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790, 2010. Google ScholarDigital Library
- S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Facility dispersion problems: Heuristics and special cases. Algorithms and Data Structures, pages 355--366, 1991.Google ScholarCross Ref
- R. Sibson. Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1):30--34, 1973.Google ScholarCross Ref
- M. J. Welch, J. Cho, and C. Olston. Search result diversity for informational queries. In WWW, pages 237--246, 2011. Google ScholarDigital Library
- D. Xin, H. Cheng, X. Yan, and J. Han. Extracting Redundancy-Aware Top-k Patterns. In SIGKDD 2006, 2006. Google ScholarDigital Library
- C. Yu, L. V. S. Lakshmanan, and S. Amer-Yahia. Recommendation diversification using explanations. In ICDE, pages 1299--1302, 2009. Google ScholarDigital Library
- C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, 2005. Google ScholarDigital Library
Index Terms
- Composable core-sets for diversity and coverage maximization
Recommendations
Diverse near neighbor problem
SoCG '13: Proceedings of the twenty-ninth annual symposium on Computational geometryMotivated by the recent research on diversity-aware search, we investigate the k-diverse near neighbor reporting problem. The problem is defined as follows: given a query point q, report the maximum diversity set S of k points in the ball of radius r ...
Scalable Diversity Maximization via Small-size Composable Core-sets (Brief Announcement)
SPAA '19: The 31st ACM Symposium on Parallelism in Algorithms and ArchitecturesIn this paper, we study the diversity maximization problem (a.k.a. maximum dispersion problem) in which given a set of n objects in a metric space, one wants to find a subset of k distinct objects with the maximum sum of pairwise distances. We address ...
Maximum coverage problem with group budget constraints
We study the maximum coverage problem with group budget constraints (MCG). The input consists of a ground set X, a collection $$\psi $$ź of subsets of X each of which is associated with a combinatorial structure such that for every set $$S_j\in \psi $$...
Comments