research-article

Open Access

Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space

Authors:
Nicholas Monath

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Manzil Zaheer

Google Research, Mountain View, CA, USA

Google Research, Mountain View, CA, USA
View Profile

,
Daniel Silva

Google Research, Mountain View, CA, USA

Google Research, Mountain View, CA, USA
View Profile

,
Andrew McCallum

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Amr Ahmed

Google Research, Mountain View, CA, USA

Google Research, Mountain View, CA, USA
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 714–722https://doi.org/10.1145/3292500.3330997

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 714–722

ABSTRACT

Hierarchical clustering is typically performed using algorithmic-based optimization searching over the discrete space of trees. While these optimization methods are often effective, their discreteness restricts them from many of the benefits of their continuous counterparts, such as scalable stochastic optimization and the joint optimization of multiple objectives or components of a model (e.g. end-to-end training). In this paper, we present an approach for hierarchical clustering that searches over continuous representations of trees in hyperbolic space by running gradient descent. We compactly represent uncertainty over tree structures with vectors in the Poincare ball. We show how the vectors can be optimized using an objective related to recently proposed cost functions for hierarchical clustering (Dasgupta, 2016; Wang and Wang, 2018). Using our method with a mini-batch stochastic gradient descent inference procedure, we are able to outperform prior work on clustering millions of ImageNet images by 15 points of dendrogram purity. Further, our continuous tree representation can be jointly optimized in multi-task learning applications offering a 9 point improvement over baseline methods.

References

R. P. Adams, Z. Ghahramani, and M. I. Jordan. 2010. Tree-Structured Stick Breaking for Hierarchical Data.Google Scholar
O. Bachem, M Lucic, H. Hassani, and A. Krause. 2016. Fast and provably good seedings for k-means. NeurIPS. Google ScholarDigital Library
M.F. Balcan, A. Blum, and S. Vempala. 2008. A discriminative framework for clustering via similarity functions. STOC. Google ScholarDigital Library
J. Bingham and S. Sudarsanam. 2000. Visualizing large hierarchical clusters in hyperbolic space. Bioinformatics.Google Scholar
C. Blundell, Y. W. Teh, and K. A. Heller. 2010. Bayesian rose trees. UAI. Google ScholarDigital Library
F. Boguná, M.and Papadopoulos and D. Krioukov. 2010. Sustaining the internet with hyperbolic mapping. Nature communications.Google Scholar
S. Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Automat. Control.Google Scholar
P. F. Brown, P. V Desouza, R. L Mercer, V. J D. Pietra, and J. C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics. Google ScholarDigital Library
M. Charikar and V. Chatziafratis. 2017. Approximate hierarchical clustering via sparsest cut and spreading metrics. SODA. Google ScholarDigital Library
M. Charikar, V. Chatziafratis, and R. Niazadeh. 2019 a. Hierarchical Clustering better than Average-Linkage. SODA. Google ScholarDigital Library
M. Charikar, V. Chatziafratis, R. Niazadeh, and G. Yaroslavtsev. 2019 b. Hierarchical Clustering for Euclidean Data. AISTATS.Google Scholar
K. Clark and C. D Manning. 2016. Improving coreference resolution by learning entity-level distributed representations. ACL.Google Scholar
V. Cohen-Addad, V. Kanade, and F. Mallmann-Trenn. 2017. Hierarchical clustering beyond the worst-case. NeurIPS. Google ScholarDigital Library
V. Cohen-Addad, V. Kanade, F. Mallmann-Trenn, and C. Mathieu. 2018. Hierarchical clustering: Objective functions and algorithms. SODA. Google ScholarDigital Library
A. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum. 2007. Author disambiguation using error-driven machine learning with a ranking loss function. IIWeb.Google Scholar
S. Dasgupta. 2016. A cost function for similarity-based hierarchical clustering. STOC. Google ScholarDigital Library
E. Emamjomeh-Zadeh and D. Kempe. 2018. Adaptive hierarchical clustering using ordinal queries. SODA. Google ScholarDigital Library
H. Fichtenberger, M. Gillé, M. Schmidt, V. Schwiegelshohn, and C. Sohler. 2013. BICO: BIRCH meets coresets for k-means clustering. ESA.Google Scholar
O.E. Ganea, G. Bécigneul, and T. Hofmann. 2018. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. ICML.Google Scholar
P. Goyal, Z. Hu, X. Liang, C. Wang, and E. P. Xing. 2017. Nonparametric variational auto-encoders for hierarchical representation learning. ICCV.Google Scholar
V. Guillemin and A. Pollack. 2010. Differential topology.Google Scholar
K. A Heller and Z. Ghahramani. 2005. Bayesian hierarchical clustering. ICML. Google ScholarDigital Library
J. Himberg, A. Hyv"arinen, and F. Esposito. 2004. Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage.Google Scholar
Y. Jernite, A. Choromanska, and D. Sontag. 2017. Simultaneous learning of trees and representations for extreme classification and density estimation. ICML. Google ScholarDigital Library
D. P Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. ICLR.Google Scholar
R. Kleinberg. 2007. Geographic routing using hyperbolic space. INFOCOM. Google ScholarDigital Library
D. A Knowles and Z. Ghahramani. 2011. Pitman-Yor diffusion trees. UAI. Google ScholarDigital Library
A. Kobren, N. Monath, A. Krishnamurthy, and A. McCallum. 2017. A hierarchical algorithm for extreme clustering. KDD. Google ScholarDigital Library
D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná. 2010. Hyperbolic geometry of complex networks. Physical Review E.Google Scholar
A. Krishnamurthy, S. Balakrishnan, M. Xu, and A. Singh. 2012. Efficient active algorithms for hierarchical clustering. ICML. Google ScholarDigital Library
J. Lamping and R. Rao. 1994. Laying out and visualizing large trees using a hyperbolic space. UIST. Google ScholarDigital Library
H. Lee, M. Recasens, A. Chang, M. Surdeanu, and D. Jurafsky. 2012. Joint entity and event coreference resolution across documents. EMNLP. Google ScholarDigital Library
K. Lee, L. He, M. Lewis, and L. Zettlemoyer. 2017. End-to-end neural coreference resolution. EMNLP.Google Scholar
B. Moseley and J. Wang. 2017. Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search. NeurIPS. Google ScholarDigital Library
M. Nickel and D. Kiela. 2017. Poincaré embeddings for learning hierarchical representations. NeurIPS. Google ScholarDigital Library
J. Pennington, R. Socher, and C. D. Manning. 2014. GloVe: Global Vectors for Word Representation. EMNLP.Google Scholar
L. Ratinov and D. Roth. 2009. Design challenges and misconceptions in named entity recognition. ACL. Google ScholarDigital Library
S. Rendle, Z. Freudenthaler, C.and Gantner, and L. Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. UAI. Google ScholarDigital Library
S. Roy and S. Pokutta. 2016. Hierarchical clustering via spreading metrics. NeurIPS. Google ScholarDigital Library
F. Sala, C. De Sa, A. Gu, and C. Ré. 2018. Representation tradeoffs for hyperbolic embeddings. ICML.Google Scholar
R. Sarkar. 2011. Low distortion delaunay embedding of trees in hyperbolic plane. GD. Google ScholarDigital Library
D. Sculley. 2010. Web-scale k-means clustering. WWW. Google ScholarDigital Library
J. Seo and B. Shneiderman. 2002. Interactively exploring hierarchical clustering results {gene identification}. Computer. Google ScholarDigital Library
T. Sørlie, C. M Perou, R. Tibshirani, T. Aas, S. Geisler, H. J.sen, T. Hastie, M. B Eisen, M. Van De Rijn, S. S Jeffrey, et almbox. 2001. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS.Google Scholar
M. Spivak. 1979. A comprehensive introduction to differential geometry, Publish or Perish.Google Scholar
E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum. 2018. Linguistically-Informed Self-Attention for Semantic Role Labeling. EMNLP.Google Scholar
Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. 2019. Poincare Glove: Hyperbolic Word Embeddings. ICLR.Google Scholar
T. D. Q. Vinh, Y. Tay, S. Zhang, G. Cong, and X.-L. Li. 2018. Hyperbolic Recommender Systems. arxiv.Google Scholar
D. Wang and Y. Wang. 2018. An Improved Cost Function for Hierarchical Cluster Trees. arXiv.Google Scholar
M. Wick, S. Singh, and A. McCallum. 2012. A discriminative hierarchical model for fast coreference at large scale. ACL. Google ScholarDigital Library
D. H Widyantoro, T. R Ioerger, and J. Yen. 2002. An incremental approach to building a cluster hierarchy. ICDM. Google ScholarDigital Library
A. R Zamir, A. Sax, W. Shen, L. J Guibas, J. Malik, and S. Savarese. 2018. Taskonomy: Disentangling task transfer learning. CVPR.Google Scholar
H. Zhang, S. J Reddi, and S. Sra. 2016. Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. NeurIPS. Google ScholarDigital Library
T. Zhang, R. Ramakrishnan, and M. Livny. 1996. BIRCH: A new data clustering algorithm and its applications. SIGMOD.Google ScholarDigital Library
Y. Zhang, A. Ahmed, V. Josifovski, and A. Smola. 2014. Taxonomy discovery for personalized recommendation. ICDM.Google Scholar
Y. Zhang and D.-Y. Yeung. 2010. A Convex Formulation for Learning Task Relationships in Multi-Task Learning citation. UAI. Google ScholarDigital Library
Y. Zhao and G. Karypis. 2002. Evaluation of hierarchical clustering algorithms for document datasets. CIKM. Google ScholarDigital Library

Index Terms

Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space
1. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Cluster analysis

Recommendations

Scalable Hierarchical Agglomerative Clustering
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

The applicability of agglomerative clustering, for inferring both hierarchical and flat clustering, is limited by its scalability. Existing scalable hierarchical clustering methods sacrifice quality for speed and often lead to over-merging of clusters. ...
Read More
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Read More
Hierarchical Means Clustering
Abstract
In the cluster analysis literature, there are several partitioning (non-hierarchical) methods for clustering multivariate objects based on model estimation. Distinct to these methods is the use of a system of n nested statistical models and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
gradient-based clustering
hierarchical clustering
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 3,437
  Total Downloads
- Downloads (Last 12 months)603
- Downloads (Last 6 weeks)63
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable Hierarchical Agglomerative Clustering

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Hierarchical Means Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable Hierarchical Agglomerative Clustering

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Hierarchical Means Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media