skip to main content
10.1145/2623330.2623732acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

DeepWalk: online learning of social representations

Published:24 August 2014Publication History

ABSTRACT

We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs.

DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data.

DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.

Skip Supplemental Material Section

Supplemental Material

p701-sidebyside.mp4

mp4

242.7 MB

References

  1. R. Al-Rfou, B. Perozzi, and S. Skiena. Polyglot: Distributed word representations for multilingual nlp. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 183--192, Sofia, Bulgaria, August 2013. ACL.Google ScholarGoogle Scholar
  2. R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on, pages 475--486. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. 2013.Google ScholarGoogle Scholar
  4. Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137--1155, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Bottou. Stochastic gradient learning in neural networks. In Proceedings of Neuro-Nîmes 91, Nimes, France, 1991. EC2.Google ScholarGoogle Scholar
  6. V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3):15, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th ICML, ICML '08, pages 160--167. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):30--42, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1232--1240. 2012.Google ScholarGoogle Scholar
  10. D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11:625--660, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Fouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. Knowledge and Data Engineering, IEEE Transactions on, 19(3):355--369, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In Advances in Social Network Mining and Analysis, pages 1--19. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos. Using ghost edges for classification in sparsely labeled networks. In Proceedings of the 14th ACM SIGKDD, KDD '08, pages 256--264, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (6):721--741, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Getoor and B. Taskar. Introduction to statistical relational learning. MIT press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It's who you know: Graph mining using recursive structural features. In Proceedings of the 17th ACM SIGKDD, KDD '11, pages 663--671, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. E. Hinton. Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society, pages 1--12. Amherst, MA, 1986.Google ScholarGoogle Scholar
  19. R. A. Hummel and S. W. Zucker. On the foundations of relaxation labeling processes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, :267--287, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. U. Kang, H. Tong, and J. Sun. Fast random walk graph kernel. In SDM, pages 828--838, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In ICML, volume 2, pages 315--322, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, volume 1, page 4, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019--1031, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Lin and W. Cohen. Semi-supervised classification of network data using very few labels. In Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference on, pages 192--199, Aug 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. A. Macskassy and F. Provost. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003, pages 64--76, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  26. S. A. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research, 8:935--983, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.Google ScholarGoogle Scholar
  28. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, pages 3111--3119. 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Mikolov, W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In Proceedings of NAACL-HLT, pages 746--751, 2013.Google ScholarGoogle Scholar
  30. A. Mnih and G. E. Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081--1088, 2009.Google ScholarGoogle Scholar
  31. F. Morin and Y. Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246--252, 2005.Google ScholarGoogle Scholar
  32. J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pages 13--20, 2000.Google ScholarGoogle Scholar
  33. J. Neville and D. Jensen. Leveraging relational autocorrelation with latent group models. In Proceedings of the 4th International Workshop on Multi-relational Mining, MRDM '05, pages 49--55, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Neville and D. Jensen. A bias/variance decomposition for models using collective inference. Machine Learning, 73(1):87--106, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577--8582, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  36. B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24, pages 693--701. 2011.Google ScholarGoogle Scholar
  37. P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 81--90. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Tang and H. Liu. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD, KDD '09, pages 817--826, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 1107--1116. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):44--478, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt. Graph kernels. The Journal of Machine Learning Research, 99:1201--1242, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. X. Wang and G. Sukthankar. Multi-label relational neighbor classification using social context features. In Proceedings of the 19th ACM SIGKDD, pages 464--472. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W. Zachary. An information flow model for conflict and fission in small groups1. Journal of anthropological research, 33(4):452--473, 1977.Google ScholarGoogle Scholar

Index Terms

  1. DeepWalk: online learning of social representations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2014
        2028 pages
        ISBN:9781450329569
        DOI:10.1145/2623330

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader