ABSTRACT
We develop and apply the Balcan-Blum-Srebro (BBS) theory of classification via similarity functions (which are not necessarily kernels) to the problem of graph classification. First we place the BBS theory into the unifying framework of optimal transport theory. This also opens the way to exploit coupling methods for establishing properties required of a good similarity function as per their definition. Next, we use the approach to the problem of graph classification via geometric embeddings such as the Laplacian, pseudo-inverse Laplacian and the Lovász orthogonal labellings. We consider the similarity function given by optimal and near--optimal matchings with respect to Euclidean distance of the corresponding embeddings of the graphs in high dimensions. We use optimal couplings to rigorously establish that this yields a "good" similarity measure in the BBS sense for two well known families of graphs. Further, we show that the similarity yields better classification accuracy in practice, on these families, than matchings of other well-known graph embeddings. Finally we perform an extensive empirical evaluation on benchmark data sets where we show that classifying graphs using matchings of geometric embeddings outperforms the previous state-of-the-art methods.
Supplemental Material
- Pankaj K Agarwal and R Sharathkumar. Approximation algorithms for bipartite matching with metric and geometric costs. In STOC 14, 2014. Google ScholarDigital Library
- Maria-Florina Balcan, Avrim Blum, and Nathan Srebro. A theory of learning with similarity functions. Machine Learning, 72(1--2):89--112, 2008. Google ScholarDigital Library
- Aurelien Bellet, Amaury Habrard, and Marc Sebban. Similarity learning for provably accurate sparse linear classification. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML '12, pages 1871--1878, New York, NY, USA, July 2012. Omnipress.Google Scholar
- Karsten M Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In Proceedings of ICDM, pages 74--81, 2005. Google ScholarDigital Library
- Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21(suppl 1):i47--i56, 2005. Google ScholarDigital Library
- Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013.Google ScholarCross Ref
- T.-H. Hubert Chan, Kevin L. Chang, and Rajiv Raman. An sdp primal-dual algorithm for approximating the lovász-theta function. In Proceedings of ISIT, pages 2808--2812, Piscataway, NJ, USA, 2009. IEEE Press. Google ScholarDigital Library
- Asim Kumar Debnath, Rosa L. Lopez de Compadre, Gargi Debnath, Alan J. Shusterman, and Corwin Hansch. Structureactivity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medicinal Chemistry, 34:786--797, 1991.Google ScholarCross Ref
- Ran Duan and Seth Pettie. Linear-time approximation for maximum weight matching. Journal of the ACM (JACM), 61(1):1, 2014. Google ScholarDigital Library
- Devdatt P Dubhashi and Alessandro Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, 2009. Google ScholarCross Ref
- Uriel Feige and Robert Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures & Algorithms, 16(2):195--208, 2000. Google ScholarDigital Library
- Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, and Karsten M. Borgwardt. Scalable kernels for graphs with continuous attributes. In NIPS, pages 216--224, 2013.Google Scholar
- Holger Fröhlich, Jörg K. Wegner, Florian Sieker, and Andreas Zell. Optimal assignment kernels for attributed molecular graphs. In Luc de Raedt and Stefan Wrobel, editors, Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), pages 225--232, Bonn, Germany, August 2005. ACM Press. Google ScholarDigital Library
- Thomas Gärtner. A survey of kernels for structured data. SIGKDD Explor. Newsl., 5(1):49--58, July 2003. Google ScholarDigital Library
- Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129--143, 2003.Google ScholarCross Ref
- Michel X. Goemans. Semidefinite programming in combinatorial optimization. Math. Program., 79:143--161, 1997. Google ScholarDigital Library
- John C Gower and Garmt B Dijksterhuis. Procrustes problems, volume 3. Oxford University Press Oxford, 2004.Google ScholarCross Ref
- Xiao W. Gutman, I. Generalized inverse of the laplacian matrix and some applications. Bulletin. Classe des Sciences Mathématiques et Naturelles. Sciences Mathématiques, 129(29):15--23, 2004.Google ScholarCross Ref
- David Haussler. Convolution kernels on discrete structures. Technical report, Technical report, Department of Computer Science, University of California at Santa Cruz, 1999.Google Scholar
- C. Helma, R. D. King, S. Kramer, and A. Srinivasan. The predictive toxicology challenge 2000--2001. Bioinformatics, 17(1):107--108, 2001.Google ScholarCross Ref
- Linus Hermansson, Tommi Kerola, Fredrik Johansson, Vinay Jethava, and Devdatt Dubhashi. Entity disambiguation in anonymized graphs using graph kernels. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pages 1037--1046. ACM, 2013. Google ScholarDigital Library
- Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, and Devdatt Dubhashi. Lovasz theta function, svms and finding dense subgraphs. Journal of Machine Learning Research, 14:3495--3536, 2014. Google ScholarDigital Library
- Fredrik D. Johansson, Vinay Jethava, Devdatt Dubhashi, and Chiranjib Bhattacharyya. Global graph kernels using geometric embeddings. In Tony Jebara and Eric P. Xing, editors, Proceedings of the 31st International Conference on Machine Learning (ICML), pages 694--702. JMLR Workshop and Conference Proceedings, 2014.Google Scholar
- Purushottam Kar and Prateek Jain. Similarity-based learning via data driven embeddings. In Advances in neural information processing systems, pages 1998--2006, 2011.Google Scholar
- Purushottam Kar and Prateek Jain. Supervised learning with similarity functions. In Advances in neural information processing systems, pages 215--223, 2012.Google Scholar
- Michel Ledoux. The concentration of measure phenomenon, volume 89. American Mathematical Soc., 2005.Google ScholarCross Ref
- Torgny Lindvall. Lectures on the coupling method. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, 1992. A Wiley-Interscience Publication.Google Scholar
- László Lovász. On the shannon capacity of a graph. IEEE Transactions on Information Theory, 25(1):1--7, 1979. Google ScholarDigital Library
- Carlos J. Luz and Alexander Schrijver. A convex quadratic characterization of the lovász theta number. SIAM J. Discrete Math., 19(2):382--387, 2005. Google ScholarDigital Library
- Elzbieta Pękekalska and Robert PW Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8):943--956, 2002. Google ScholarDigital Library
- Seth Pettie and Peter Sanders. A simpler linear time 2/3-epsilon approximation for maximum weight matching. Inf. Process. Lett., 91(6):271--276, 2004. Google ScholarDigital Library
- Bernhard Schölkopf and Alexander J Smola. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press, 2001. Google ScholarDigital Library
- Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3--30, 2011. Google ScholarDigital Library
- Blake Shaw and Tony Jebara. Structure preserving embedding. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 937--944. ACM, 2009. Google ScholarDigital Library
- Nino Shervashidze and Karsten Borgwardt. Fast subtree kernels on graphs. In Proceedings of NIPS, pages 1660--1668. 2009.Google Scholar
- Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12:2539--2561, 2011. Google ScholarDigital Library
- Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten M Borgwardt. Efficient graphlet kernels for large graph comparison. In Proceedings of AISTATS, 2009.Google Scholar
- Hermann Thorisson. Coupling, stationarity, and regeneration. Probability and its Applications (New York). Springer-Verlag, New York, 2000.Google Scholar
- Jean-Philippe Vert. The optimal assignment kernel is not positive definite. CoRR, abs/0801.4061, 2008.Google Scholar
- Cédric Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.Google Scholar
- SVN Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels. Journal of Machine Learning Research, 11:1201--1242, 2010. Google ScholarCross Ref
- Nikil Wale, IanA. Watson, and George Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14(3):347--375, 2008. Google ScholarDigital Library
- Liwei Wang, Cheng Yang, and Jufu Feng. On learning with dissimilarity functions. In Proceedings of the 24th international conference on Machine learning, pages 991--998. ACM, 2007. Google ScholarDigital Library
- Adam Woznica, Alexandros Kalousis, and Melanie Hilario. Matching based kernels for labeled graphs. In ECML/PKDD Workshop on Mining and Learning with Graphs, 2006. Google ScholarDigital Library
Index Terms
- Learning with Similarity Functions on Graphs using Matchings of Geometric Embeddings
Recommendations
Deep Graph Kernels
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn this paper, we present Deep Graph Kernels, a unified framework to learn latent representations of sub-structures for graphs, inspired by latest advancements in language modeling and deep learning. Our framework leverages the dependency information ...
DeepWalk: online learning of social representations
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningWe present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes ...
Optimal assignment kernels for attributed molecular graphs
ICML '05: Proceedings of the 22nd international conference on Machine learningWe propose a new kernel function for attributed molecular graphs, which is based on the idea of computing an optimal assignment from the atoms of one molecule to those of another one, including information on neighborhood, membership to a certain ...
Comments