ABSTRACT
Hierarchical clustering is typically performed using algorithmic-based optimization searching over the discrete space of trees. While these optimization methods are often effective, their discreteness restricts them from many of the benefits of their continuous counterparts, such as scalable stochastic optimization and the joint optimization of multiple objectives or components of a model (e.g. end-to-end training). In this paper, we present an approach for hierarchical clustering that searches over continuous representations of trees in hyperbolic space by running gradient descent. We compactly represent uncertainty over tree structures with vectors in the Poincare ball. We show how the vectors can be optimized using an objective related to recently proposed cost functions for hierarchical clustering (Dasgupta, 2016; Wang and Wang, 2018). Using our method with a mini-batch stochastic gradient descent inference procedure, we are able to outperform prior work on clustering millions of ImageNet images by 15 points of dendrogram purity. Further, our continuous tree representation can be jointly optimized in multi-task learning applications offering a 9 point improvement over baseline methods.
- R. P. Adams, Z. Ghahramani, and M. I. Jordan. 2010. Tree-Structured Stick Breaking for Hierarchical Data.Google Scholar
- O. Bachem, M Lucic, H. Hassani, and A. Krause. 2016. Fast and provably good seedings for k-means. NeurIPS. Google ScholarDigital Library
- M.F. Balcan, A. Blum, and S. Vempala. 2008. A discriminative framework for clustering via similarity functions. STOC. Google ScholarDigital Library
- J. Bingham and S. Sudarsanam. 2000. Visualizing large hierarchical clusters in hyperbolic space. Bioinformatics.Google Scholar
- C. Blundell, Y. W. Teh, and K. A. Heller. 2010. Bayesian rose trees. UAI. Google ScholarDigital Library
- F. Boguná, M.and Papadopoulos and D. Krioukov. 2010. Sustaining the internet with hyperbolic mapping. Nature communications.Google Scholar
- S. Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Automat. Control.Google Scholar
- P. F. Brown, P. V Desouza, R. L Mercer, V. J D. Pietra, and J. C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics. Google ScholarDigital Library
- M. Charikar and V. Chatziafratis. 2017. Approximate hierarchical clustering via sparsest cut and spreading metrics. SODA. Google ScholarDigital Library
- M. Charikar, V. Chatziafratis, and R. Niazadeh. 2019 a. Hierarchical Clustering better than Average-Linkage. SODA. Google ScholarDigital Library
- M. Charikar, V. Chatziafratis, R. Niazadeh, and G. Yaroslavtsev. 2019 b. Hierarchical Clustering for Euclidean Data. AISTATS.Google Scholar
- K. Clark and C. D Manning. 2016. Improving coreference resolution by learning entity-level distributed representations. ACL.Google Scholar
- V. Cohen-Addad, V. Kanade, and F. Mallmann-Trenn. 2017. Hierarchical clustering beyond the worst-case. NeurIPS. Google ScholarDigital Library
- V. Cohen-Addad, V. Kanade, F. Mallmann-Trenn, and C. Mathieu. 2018. Hierarchical clustering: Objective functions and algorithms. SODA. Google ScholarDigital Library
- A. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum. 2007. Author disambiguation using error-driven machine learning with a ranking loss function. IIWeb.Google Scholar
- S. Dasgupta. 2016. A cost function for similarity-based hierarchical clustering. STOC. Google ScholarDigital Library
- E. Emamjomeh-Zadeh and D. Kempe. 2018. Adaptive hierarchical clustering using ordinal queries. SODA. Google ScholarDigital Library
- H. Fichtenberger, M. Gillé, M. Schmidt, V. Schwiegelshohn, and C. Sohler. 2013. BICO: BIRCH meets coresets for k-means clustering. ESA.Google Scholar
- O.E. Ganea, G. Bécigneul, and T. Hofmann. 2018. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. ICML.Google Scholar
- P. Goyal, Z. Hu, X. Liang, C. Wang, and E. P. Xing. 2017. Nonparametric variational auto-encoders for hierarchical representation learning. ICCV.Google Scholar
- V. Guillemin and A. Pollack. 2010. Differential topology.Google Scholar
- K. A Heller and Z. Ghahramani. 2005. Bayesian hierarchical clustering. ICML. Google ScholarDigital Library
- J. Himberg, A. Hyv"arinen, and F. Esposito. 2004. Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage.Google Scholar
- Y. Jernite, A. Choromanska, and D. Sontag. 2017. Simultaneous learning of trees and representations for extreme classification and density estimation. ICML. Google ScholarDigital Library
- D. P Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. ICLR.Google Scholar
- R. Kleinberg. 2007. Geographic routing using hyperbolic space. INFOCOM. Google ScholarDigital Library
- D. A Knowles and Z. Ghahramani. 2011. Pitman-Yor diffusion trees. UAI. Google ScholarDigital Library
- A. Kobren, N. Monath, A. Krishnamurthy, and A. McCallum. 2017. A hierarchical algorithm for extreme clustering. KDD. Google ScholarDigital Library
- D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná. 2010. Hyperbolic geometry of complex networks. Physical Review E.Google Scholar
- A. Krishnamurthy, S. Balakrishnan, M. Xu, and A. Singh. 2012. Efficient active algorithms for hierarchical clustering. ICML. Google ScholarDigital Library
- J. Lamping and R. Rao. 1994. Laying out and visualizing large trees using a hyperbolic space. UIST. Google ScholarDigital Library
- H. Lee, M. Recasens, A. Chang, M. Surdeanu, and D. Jurafsky. 2012. Joint entity and event coreference resolution across documents. EMNLP. Google ScholarDigital Library
- K. Lee, L. He, M. Lewis, and L. Zettlemoyer. 2017. End-to-end neural coreference resolution. EMNLP.Google Scholar
- B. Moseley and J. Wang. 2017. Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search. NeurIPS. Google ScholarDigital Library
- M. Nickel and D. Kiela. 2017. Poincaré embeddings for learning hierarchical representations. NeurIPS. Google ScholarDigital Library
- J. Pennington, R. Socher, and C. D. Manning. 2014. GloVe: Global Vectors for Word Representation. EMNLP.Google Scholar
- L. Ratinov and D. Roth. 2009. Design challenges and misconceptions in named entity recognition. ACL. Google ScholarDigital Library
- S. Rendle, Z. Freudenthaler, C.and Gantner, and L. Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. UAI. Google ScholarDigital Library
- S. Roy and S. Pokutta. 2016. Hierarchical clustering via spreading metrics. NeurIPS. Google ScholarDigital Library
- F. Sala, C. De Sa, A. Gu, and C. Ré. 2018. Representation tradeoffs for hyperbolic embeddings. ICML.Google Scholar
- R. Sarkar. 2011. Low distortion delaunay embedding of trees in hyperbolic plane. GD. Google ScholarDigital Library
- D. Sculley. 2010. Web-scale k-means clustering. WWW. Google ScholarDigital Library
- J. Seo and B. Shneiderman. 2002. Interactively exploring hierarchical clustering results {gene identification}. Computer. Google ScholarDigital Library
- T. Sørlie, C. M Perou, R. Tibshirani, T. Aas, S. Geisler, H. J.sen, T. Hastie, M. B Eisen, M. Van De Rijn, S. S Jeffrey, et almbox. 2001. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS.Google Scholar
- M. Spivak. 1979. A comprehensive introduction to differential geometry, Publish or Perish.Google Scholar
- E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum. 2018. Linguistically-Informed Self-Attention for Semantic Role Labeling. EMNLP.Google Scholar
- Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. 2019. Poincare Glove: Hyperbolic Word Embeddings. ICLR.Google Scholar
- T. D. Q. Vinh, Y. Tay, S. Zhang, G. Cong, and X.-L. Li. 2018. Hyperbolic Recommender Systems. arxiv.Google Scholar
- D. Wang and Y. Wang. 2018. An Improved Cost Function for Hierarchical Cluster Trees. arXiv.Google Scholar
- M. Wick, S. Singh, and A. McCallum. 2012. A discriminative hierarchical model for fast coreference at large scale. ACL. Google ScholarDigital Library
- D. H Widyantoro, T. R Ioerger, and J. Yen. 2002. An incremental approach to building a cluster hierarchy. ICDM. Google ScholarDigital Library
- A. R Zamir, A. Sax, W. Shen, L. J Guibas, J. Malik, and S. Savarese. 2018. Taskonomy: Disentangling task transfer learning. CVPR.Google Scholar
- H. Zhang, S. J Reddi, and S. Sra. 2016. Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. NeurIPS. Google ScholarDigital Library
- T. Zhang, R. Ramakrishnan, and M. Livny. 1996. BIRCH: A new data clustering algorithm and its applications. SIGMOD.Google ScholarDigital Library
- Y. Zhang, A. Ahmed, V. Josifovski, and A. Smola. 2014. Taxonomy discovery for personalized recommendation. ICDM.Google Scholar
- Y. Zhang and D.-Y. Yeung. 2010. A Convex Formulation for Learning Task Relationships in Multi-Task Learning citation. UAI. Google ScholarDigital Library
- Y. Zhao and G. Karypis. 2002. Evaluation of hierarchical clustering algorithms for document datasets. CIKM. Google ScholarDigital Library
Index Terms
- Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space
Recommendations
Scalable Hierarchical Agglomerative Clustering
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningThe applicability of agglomerative clustering, for inferring both hierarchical and flat clustering, is limited by its scalability. Existing scalable hierarchical clustering methods sacrifice quality for speed and often lead to over-merging of clusters. ...
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Hierarchical Means Clustering
AbstractIn the cluster analysis literature, there are several partitioning (non-hierarchical) methods for clustering multivariate objects based on model estimation. Distinct to these methods is the use of a system of n nested statistical models and the ...
Comments