skip to main content
10.1145/3292500.3330997acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open Access

Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space

Published:25 July 2019Publication History

ABSTRACT

Hierarchical clustering is typically performed using algorithmic-based optimization searching over the discrete space of trees. While these optimization methods are often effective, their discreteness restricts them from many of the benefits of their continuous counterparts, such as scalable stochastic optimization and the joint optimization of multiple objectives or components of a model (e.g. end-to-end training). In this paper, we present an approach for hierarchical clustering that searches over continuous representations of trees in hyperbolic space by running gradient descent. We compactly represent uncertainty over tree structures with vectors in the Poincare ball. We show how the vectors can be optimized using an objective related to recently proposed cost functions for hierarchical clustering (Dasgupta, 2016; Wang and Wang, 2018). Using our method with a mini-batch stochastic gradient descent inference procedure, we are able to outperform prior work on clustering millions of ImageNet images by 15 points of dendrogram purity. Further, our continuous tree representation can be jointly optimized in multi-task learning applications offering a 9 point improvement over baseline methods.

References

  1. R. P. Adams, Z. Ghahramani, and M. I. Jordan. 2010. Tree-Structured Stick Breaking for Hierarchical Data.Google ScholarGoogle Scholar
  2. O. Bachem, M Lucic, H. Hassani, and A. Krause. 2016. Fast and provably good seedings for k-means. NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M.F. Balcan, A. Blum, and S. Vempala. 2008. A discriminative framework for clustering via similarity functions. STOC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bingham and S. Sudarsanam. 2000. Visualizing large hierarchical clusters in hyperbolic space. Bioinformatics.Google ScholarGoogle Scholar
  5. C. Blundell, Y. W. Teh, and K. A. Heller. 2010. Bayesian rose trees. UAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Boguná, M.and Papadopoulos and D. Krioukov. 2010. Sustaining the internet with hyperbolic mapping. Nature communications.Google ScholarGoogle Scholar
  7. S. Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Automat. Control.Google ScholarGoogle Scholar
  8. P. F. Brown, P. V Desouza, R. L Mercer, V. J D. Pietra, and J. C Lai. 1992. Class-based n-gram models of natural language. Computational linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Charikar and V. Chatziafratis. 2017. Approximate hierarchical clustering via sparsest cut and spreading metrics. SODA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Charikar, V. Chatziafratis, and R. Niazadeh. 2019 a. Hierarchical Clustering better than Average-Linkage. SODA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Charikar, V. Chatziafratis, R. Niazadeh, and G. Yaroslavtsev. 2019 b. Hierarchical Clustering for Euclidean Data. AISTATS.Google ScholarGoogle Scholar
  12. K. Clark and C. D Manning. 2016. Improving coreference resolution by learning entity-level distributed representations. ACL.Google ScholarGoogle Scholar
  13. V. Cohen-Addad, V. Kanade, and F. Mallmann-Trenn. 2017. Hierarchical clustering beyond the worst-case. NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Cohen-Addad, V. Kanade, F. Mallmann-Trenn, and C. Mathieu. 2018. Hierarchical clustering: Objective functions and algorithms. SODA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Culotta, P. Kanani, R. Hall, M. Wick, and A. McCallum. 2007. Author disambiguation using error-driven machine learning with a ranking loss function. IIWeb.Google ScholarGoogle Scholar
  16. S. Dasgupta. 2016. A cost function for similarity-based hierarchical clustering. STOC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Emamjomeh-Zadeh and D. Kempe. 2018. Adaptive hierarchical clustering using ordinal queries. SODA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Fichtenberger, M. Gillé, M. Schmidt, V. Schwiegelshohn, and C. Sohler. 2013. BICO: BIRCH meets coresets for k-means clustering. ESA.Google ScholarGoogle Scholar
  19. O.E. Ganea, G. Bécigneul, and T. Hofmann. 2018. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. ICML.Google ScholarGoogle Scholar
  20. P. Goyal, Z. Hu, X. Liang, C. Wang, and E. P. Xing. 2017. Nonparametric variational auto-encoders for hierarchical representation learning. ICCV.Google ScholarGoogle Scholar
  21. V. Guillemin and A. Pollack. 2010. Differential topology.Google ScholarGoogle Scholar
  22. K. A Heller and Z. Ghahramani. 2005. Bayesian hierarchical clustering. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Himberg, A. Hyv"arinen, and F. Esposito. 2004. Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage.Google ScholarGoogle Scholar
  24. Y. Jernite, A. Choromanska, and D. Sontag. 2017. Simultaneous learning of trees and representations for extreme classification and density estimation. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. P Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. ICLR.Google ScholarGoogle Scholar
  26. R. Kleinberg. 2007. Geographic routing using hyperbolic space. INFOCOM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. A Knowles and Z. Ghahramani. 2011. Pitman-Yor diffusion trees. UAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Kobren, N. Monath, A. Krishnamurthy, and A. McCallum. 2017. A hierarchical algorithm for extreme clustering. KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná. 2010. Hyperbolic geometry of complex networks. Physical Review E.Google ScholarGoogle Scholar
  30. A. Krishnamurthy, S. Balakrishnan, M. Xu, and A. Singh. 2012. Efficient active algorithms for hierarchical clustering. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Lamping and R. Rao. 1994. Laying out and visualizing large trees using a hyperbolic space. UIST. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Lee, M. Recasens, A. Chang, M. Surdeanu, and D. Jurafsky. 2012. Joint entity and event coreference resolution across documents. EMNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Lee, L. He, M. Lewis, and L. Zettlemoyer. 2017. End-to-end neural coreference resolution. EMNLP.Google ScholarGoogle Scholar
  34. B. Moseley and J. Wang. 2017. Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search. NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Nickel and D. Kiela. 2017. Poincaré embeddings for learning hierarchical representations. NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Pennington, R. Socher, and C. D. Manning. 2014. GloVe: Global Vectors for Word Representation. EMNLP.Google ScholarGoogle Scholar
  37. L. Ratinov and D. Roth. 2009. Design challenges and misconceptions in named entity recognition. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Rendle, Z. Freudenthaler, C.and Gantner, and L. Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. UAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Roy and S. Pokutta. 2016. Hierarchical clustering via spreading metrics. NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Sala, C. De Sa, A. Gu, and C. Ré. 2018. Representation tradeoffs for hyperbolic embeddings. ICML.Google ScholarGoogle Scholar
  41. R. Sarkar. 2011. Low distortion delaunay embedding of trees in hyperbolic plane. GD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Sculley. 2010. Web-scale k-means clustering. WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Seo and B. Shneiderman. 2002. Interactively exploring hierarchical clustering results {gene identification}. Computer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Sørlie, C. M Perou, R. Tibshirani, T. Aas, S. Geisler, H. J.sen, T. Hastie, M. B Eisen, M. Van De Rijn, S. S Jeffrey, et almbox. 2001. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. PNAS.Google ScholarGoogle Scholar
  45. M. Spivak. 1979. A comprehensive introduction to differential geometry, Publish or Perish.Google ScholarGoogle Scholar
  46. E. Strubell, P. Verga, D. Andor, D. Weiss, and A. McCallum. 2018. Linguistically-Informed Self-Attention for Semantic Role Labeling. EMNLP.Google ScholarGoogle Scholar
  47. Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. 2019. Poincare Glove: Hyperbolic Word Embeddings. ICLR.Google ScholarGoogle Scholar
  48. T. D. Q. Vinh, Y. Tay, S. Zhang, G. Cong, and X.-L. Li. 2018. Hyperbolic Recommender Systems. arxiv.Google ScholarGoogle Scholar
  49. D. Wang and Y. Wang. 2018. An Improved Cost Function for Hierarchical Cluster Trees. arXiv.Google ScholarGoogle Scholar
  50. M. Wick, S. Singh, and A. McCallum. 2012. A discriminative hierarchical model for fast coreference at large scale. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. D. H Widyantoro, T. R Ioerger, and J. Yen. 2002. An incremental approach to building a cluster hierarchy. ICDM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. R Zamir, A. Sax, W. Shen, L. J Guibas, J. Malik, and S. Savarese. 2018. Taskonomy: Disentangling task transfer learning. CVPR.Google ScholarGoogle Scholar
  53. H. Zhang, S. J Reddi, and S. Sra. 2016. Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. T. Zhang, R. Ramakrishnan, and M. Livny. 1996. BIRCH: A new data clustering algorithm and its applications. SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Y. Zhang, A. Ahmed, V. Josifovski, and A. Smola. 2014. Taxonomy discovery for personalized recommendation. ICDM.Google ScholarGoogle Scholar
  56. Y. Zhang and D.-Y. Yeung. 2010. A Convex Formulation for Learning Task Relationships in Multi-Task Learning citation. UAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Zhao and G. Karypis. 2002. Evaluation of hierarchical clustering algorithms for document datasets. CIKM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader