skip to main content
10.1145/2783258.2783341acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Learning with Similarity Functions on Graphs using Matchings of Geometric Embeddings

Published:10 August 2015Publication History

ABSTRACT

We develop and apply the Balcan-Blum-Srebro (BBS) theory of classification via similarity functions (which are not necessarily kernels) to the problem of graph classification. First we place the BBS theory into the unifying framework of optimal transport theory. This also opens the way to exploit coupling methods for establishing properties required of a good similarity function as per their definition. Next, we use the approach to the problem of graph classification via geometric embeddings such as the Laplacian, pseudo-inverse Laplacian and the Lovász orthogonal labellings. We consider the similarity function given by optimal and near--optimal matchings with respect to Euclidean distance of the corresponding embeddings of the graphs in high dimensions. We use optimal couplings to rigorously establish that this yields a "good" similarity measure in the BBS sense for two well known families of graphs. Further, we show that the similarity yields better classification accuracy in practice, on these families, than matchings of other well-known graph embeddings. Finally we perform an extensive empirical evaluation on benchmark data sets where we show that classifying graphs using matchings of geometric embeddings outperforms the previous state-of-the-art methods.

Skip Supplemental Material Section

Supplemental Material

p467.mp4

mp4

167 MB

References

  1. Pankaj K Agarwal and R Sharathkumar. Approximation algorithms for bipartite matching with metric and geometric costs. In STOC 14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Maria-Florina Balcan, Avrim Blum, and Nathan Srebro. A theory of learning with similarity functions. Machine Learning, 72(1--2):89--112, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aurelien Bellet, Amaury Habrard, and Marc Sebban. Similarity learning for provably accurate sparse linear classification. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML '12, pages 1871--1878, New York, NY, USA, July 2012. Omnipress.Google ScholarGoogle Scholar
  4. Karsten M Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In Proceedings of ICDM, pages 74--81, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl 1):i47--i56, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  7. T.-H. Hubert Chan, Kevin L. Chang, and Rajiv Raman. An sdp primal-dual algorithm for approximating the lovász-theta function. In Proceedings of ISIT, pages 2808--2812, Piscataway, NJ, USA, 2009. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Asim Kumar Debnath, Rosa L. Lopez de Compadre, Gargi Debnath, Alan J. Shusterman, and Corwin Hansch. Structureactivity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34:786--797, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ran Duan and Seth Pettie. Linear-time approximation for maximum weight matching. Journal of the ACM (JACM), 61(1):1, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Devdatt P Dubhashi and Alessandro Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  11. Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures & Algorithms, 16(2):195--208, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, and Karsten M. Borgwardt. Scalable kernels for graphs with continuous attributes. In NIPS, pages 216--224, 2013.Google ScholarGoogle Scholar
  13. Holger Fröhlich, Jörg K. Wegner, Florian Sieker, and Andreas Zell. Optimal assignment kernels for attributed molecular graphs. In Luc de Raedt and Stefan Wrobel, editors, Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), pages 225--232, Bonn, Germany, August 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thomas Gärtner. A survey of kernels for structured data. SIGKDD Explor. Newsl., 5(1):49--58, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129--143, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  16. Michel X. Goemans. Semidefinite programming in combinatorial optimization. Math. Program., 79:143--161, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. John C Gower and Garmt B Dijksterhuis. Procrustes problems, volume 3. Oxford University Press Oxford, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. Xiao W. Gutman, I. Generalized inverse of the laplacian matrix and some applications. Bulletin. Classe des Sciences Mathématiques et Naturelles. Sciences Mathématiques, 129(29):15--23, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. David Haussler. Convolution kernels on discrete structures. Technical report, Technical report, Department of Computer Science, University of California at Santa Cruz, 1999.Google ScholarGoogle Scholar
  20. C. Helma, R. D. King, S. Kramer, and A. Srinivasan. The predictive toxicology challenge 2000--2001. Bioinformatics, 17(1):107--108, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. Linus Hermansson, Tommi Kerola, Fredrik Johansson, Vinay Jethava, and Devdatt Dubhashi. Entity disambiguation in anonymized graphs using graph kernels. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pages 1037--1046. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, and Devdatt Dubhashi. Lovasz theta function, svms and finding dense subgraphs. Journal of Machine Learning Research, 14:3495--3536, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fredrik D. Johansson, Vinay Jethava, Devdatt Dubhashi, and Chiranjib Bhattacharyya. Global graph kernels using geometric embeddings. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML), pages 694--702. JMLR Workshop and Conference Proceedings, 2014.Google ScholarGoogle Scholar
  24. Purushottam Kar and Prateek Jain. Similarity-based learning via data driven embeddings. In Advances in neural information processing systems, pages 1998--2006, 2011.Google ScholarGoogle Scholar
  25. Purushottam Kar and Prateek Jain. Supervised learning with similarity functions. In Advances in neural information processing systems, pages 215--223, 2012.Google ScholarGoogle Scholar
  26. Michel Ledoux. The concentration of measure phenomenon, volume 89. American Mathematical Soc., 2005.Google ScholarGoogle ScholarCross RefCross Ref
  27. Torgny Lindvall. Lectures on the coupling method. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, 1992. A Wiley-Interscience Publication.Google ScholarGoogle Scholar
  28. László Lovász. On the shannon capacity of a graph. IEEE Transactions on Information Theory, 25(1):1--7, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Carlos J. Luz and Alexander Schrijver. A convex quadratic characterization of the lovász theta number. SIAM J. Discrete Math., 19(2):382--387, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Elzbieta Pękekalska and Robert PW Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8):943--956, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Seth Pettie and Peter Sanders. A simpler linear time 2/3-epsilon approximation for maximum weight matching. Inf. Process. Lett., 91(6):271--276, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bernhard Schölkopf and Alexander J Smola. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3--30, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Blake Shaw and Tony Jebara. Structure preserving embedding. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 937--944. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nino Shervashidze and Karsten Borgwardt. Fast subtree kernels on graphs. In Proceedings of NIPS, pages 1660--1668. 2009.Google ScholarGoogle Scholar
  36. Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12:2539--2561, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten M Borgwardt. Efficient graphlet kernels for large graph comparison. In Proceedings of AISTATS, 2009.Google ScholarGoogle Scholar
  38. Hermann Thorisson. Coupling, stationarity, and regeneration. Probability and its Applications (New York). Springer-Verlag, New York, 2000.Google ScholarGoogle Scholar
  39. Jean-Philippe Vert. The optimal assignment kernel is not positive definite. CoRR, abs/0801.4061, 2008.Google ScholarGoogle Scholar
  40. Cédric Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.Google ScholarGoogle Scholar
  41. SVN Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels. Journal of Machine Learning Research, 11:1201--1242, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  42. Nikil Wale, IanA. Watson, and George Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14(3):347--375, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Liwei Wang, Cheng Yang, and Jufu Feng. On learning with dissimilarity functions. In Proceedings of the 24th international conference on Machine learning, pages 991--998. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Adam Woznica, Alexandros Kalousis, and Melanie Hilario. Matching based kernels for labeled graphs. In ECML/PKDD Workshop on Mining and Learning with Graphs, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning with Similarity Functions on Graphs using Matchings of Geometric Embeddings

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader