research-article

Open Access

code2vec: learning distributed representations of code

Authors:
Uri Alon

Technion, Israel

Technion, Israel
View Profile

,
Meital Zilberstein

Technion, Israel

Technion, Israel
View Profile

,
Omer Levy

Facebook AI Research, USA

Facebook AI Research, USA
View Profile

,
Eran Yahav

Technion, Israel

Technion, Israel
View Profile

Proceedings of the ACM on Programming Languages Volume 3 Issue POPLArticle No.: 40pp 1–29https://doi.org/10.1145/3290353

Published:02 January 2019Publication History

Related Artifact: Implementation, data and a trained model for the code2vec paper November 2018 software https://doi.org/10.1145/3291636

Proceedings of the ACM on Programming Languages

Abstract

We present a neural model for representing snippets of code as continuous distributed vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed-length code vector, which can be used to predict semantic properties of the snippet. To this end, code is first decomposed to a collection of paths in its abstract syntax tree. Then, the network learns the atomic representation of each path while simultaneously learning how to aggregate a set of them.

We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of 12M methods. We show that code vectors trained on this dataset can predict method names from files that were unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies.

A comparison of our approach to previous techniques over the same dataset shows an improvement of more than 75%, making it the first to successfully predict method names based on a large, cross-project corpus. Our trained model, visualizations and vector similarities are available as an interactive online demo at http://code2vec.org. The code, data and trained models are available at https://github.com/tech-srl/code2vec.

Supplemental Material

a40-alon.webm

webm

78.9 MB

Download

References

Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning Natural Coding Conventions. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014) . ACM, New York, NY, USA, 281–293. Google ScholarDigital Library
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015a. Suggesting Accurate Method and Class Names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 38–49. Google ScholarDigital Library
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2017. A Survey of Machine Learning for Big Code and Naturalness. arXiv preprint arXiv:1709.06182 (2017).Google Scholar
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In ICLR .Google Scholar
Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 . 2091–2100. http://jmlr.org/proceedings/papers/v48/allamanis16.htmlGoogle Scholar
Miltiadis Allamanis and Charles Sutton. 2013. Mining Source Code Repositories at Massive Scale Using Language Modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, Piscataway, NJ, USA, 207–216. http://dl.acm.org/citation.cfm?id=2487085.2487127 Google ScholarDigital Library
Miltiadis Allamanis and Charles Sutton. 2014. Mining Idioms from Source Code. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014) . ACM, New York, NY, USA, 472–483. Google ScholarDigital Library
Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. 2015b. Bimodal Modelling of Source Code and Natural Language. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML’15) . JMLR.org, 2123–2132. http://dl.acm.org/citation.cfm?id=3045118.3045344 Google ScholarDigital Library
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-based Representation for Predicting Program Properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018) . ACM, New York, NY, USA, 404–419. Google ScholarDigital Library
Matthew Amodio, Swarat Chaudhuri, and Thomas W. Reps. 2017. Neural Attribute Machines for Program Generation. CoRR abs/1705.09231 (2017). arXiv: 1705.09231 http://arxiv.org/abs/1705.09231Google Scholar
Thierry Artieres et al. 2010. Neural conditional random fields. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics . 177–184.Google Scholar
Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014).Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473Google Scholar
Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. End-to-end attentionbased large vocabulary speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on . IEEE, 4945–4949.Google ScholarDigital Library
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3 (March 2003), 1137–1155. http://dl.acm.org/citation.cfm?id=944919.944966 Google ScholarDigital Library
Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 . 2933–2942. http://jmlr.org/proceedings/papers/v48/bielik16.html Google ScholarDigital Library
Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluation the role of bleu in machine translation research. In 11th Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in Neural Information Processing Systems. 577–585. Google ScholarDigital Library
Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25th International Conference on Machine Learning (ICML ’08). ACM, New York, NY, USA, 160–167. Google ScholarDigital Library
Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity in Binaries. In PLDI’16: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation . Google ScholarDigital Library
Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of Binaries through re-optimization. In PLDI’17: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation . Google ScholarDigital Library
Yaniv David and Eran Yahav. 2014. Tracelet-Based Code Search in Executables. In PLDI’14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation . 349–360. Google ScholarDigital Library
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391.Google ScholarCross Ref
Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. 2018. Path-based Function Embedding and Its Application to Error-handling Specification Mining. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018) . ACM, New York, NY, USA, 423–433. Google ScholarDigital Library
Greg Durrett and Dan Klein. 2015. Neural CRF Parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , Vol. 1. 302–312.Google ScholarCross Ref
J.R. Firth. 1957. A Synopsis of Linguistic Theory, 1930-1955. https://books.google.co.il/books?id=T8LDtgAACAAJGoogle Scholar
Martin Fowler and Kent Beck. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional. Google ScholarDigital Library
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics . 249–256.Google Scholar
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 513–520. Google ScholarDigital Library
Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java Expressions from Free-form Queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015) . ACM, New York, NY, USA, 416–432. Google ScholarDigital Library
Zellig S Harris. 1954. Distributional structure. Word 10, 2-3 (1954), 146–162.Google ScholarCross Ref
Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’15) . MIT Press, Cambridge, MA, USA, 1693–1701. http://dl.acm.org/ citation.cfm?id=2969239.2969428 Google ScholarDigital Library
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 837–847. http://dl.acm.org/citation.cfm?id=2337223.2337322 Google ScholarCross Ref
Einar W. Høst and Bjarte M. Østvold. 2009. Debugging Method Names. In Proceedings of the 23rd European Conference on ECOOP 2009 — Object-Oriented Programming (Genoa) . Springer-Verlag, Berlin, Heidelberg, 294–317. Google ScholarDigital Library
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers . http://aclweb.org/anthology/P/P16/P16-1195.pdfGoogle ScholarCross Ref
Omer Katz, Ran El-Yaniv, and Eran Yahav. 2016. Estimating Types in Executables using Predictive Modeling. In POPL’16: Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages . Google ScholarDigital Library
Omer Katz, Noam Rinetzky, and Eran Yahav. 2018. Statistical Reconstruction of Class Hierarchies in Binaries. In ASPLOS’18: Proceedings of the ACM Conference on Architectural Support for Programming Languages and Operating Systems . Google ScholarDigital Library
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14) , Tony Jebara and Eric P. Xing (Eds.). JMLR Workshop and Conference Proceedings, 1188–1196. http://jmlr.org/proceedings/papers/v32/le14.pdf Google ScholarDigital Library
Omer Levy and Yoav Goldberg. 2014a. Linguistic regularities in sparse and explicit word representations. In Proceedings of the 18th Conference on Computational Natural Language Learning . 171–180.Google ScholarCross Ref
Omer Levy and Yoav Goldberg. 2014b. Neural Word Embeddings as Implicit Matrix Factorization. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada . 2177–2185. Google ScholarDigital Library
Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2017. Zero-Shot Relation Extraction via Reading Comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada, August 3-4, 2017 . 333–342.Google ScholarCross Ref
Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéJàVu: A Map of Code Duplicates on GitHub. Proc. ACM Program. Lang. 1, OOPSLA, Article 84 (Oct. 2017), 28 pages. Google ScholarDigital Library
Yanxin Lu, Swarat Chaudhuri, Chris Jermaine, and David Melski. 2017. Data-Driven Program Completion. CoRR abs/1705.09042 (2017). arXiv: 1705.09042 http://arxiv.org/abs/1705.09042Google Scholar
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 . 1412–1421. http://aclweb.org/anthology/D/D15/D15-1166.pdfGoogle ScholarCross Ref
Chris J. Maddison and Daniel Tarlow. 2014. Structured Generative Models of Natural Source Code. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14) . JMLR.org, II–649–II–657. http://dl.acm.org/citation.cfm?id=3044805.3044965 Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13) . Curran Associates Inc., USA, 3111–3119. http://dl.acm.org/citation.cfm?id=2999792.2999959 Google ScholarDigital Library
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013c. Linguistic regularities in continuous space word representations.Google Scholar
Alon Mishne, Sharon Shoham, and Eran Yahav. 2012. Typestate-based Semantic Code Search over Partial Programs. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’12) . ACM, New York, NY, USA, 997–1016. Google ScholarDigital Library
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14) . MIT Press, Cambridge, MA, USA, 2204–2212. http://dl.acm.org/citation.cfm?id=2969033.2969073 Google ScholarDigital Library
Dana Movshovitz-Attias and William W Cohen. 2013. Natural language models for predicting programming comments. (2013).Google Scholar
Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian Sketch Learning for Program Synthesis. CoRR abs/1703.05698 (2017). arXiv: 1703.05698 http://arxiv.org/abs/1703.05698Google Scholar
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2013. A Statistical Semantic Language Model for Source Code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013) . ACM, New York, NY, USA, 532–542. Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP) . 1532–1543. http://www.aclweb.org/anthology/D14-1162Google Scholar
Veselin Raychev, Pavol Bielik, and Martin Vechev. 2016a. Probabilistic Model for Code with Decision Trees. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016) . ACM, New York, NY, USA, 731–747. Google ScholarDigital Library
Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016b. Learning Programs from Noisy Data. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16) . ACM, New York, NY, USA, 761–774. Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15) . ACM, New York, NY, USA, 111–124. Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14) . ACM, New York, NY, USA, 419–428. Google ScholarDigital Library
Reuven Rubinstein. 1999. The cross-entropy method for combinatorial and continuous optimization. Methodology and Computing in Applied Probability 1, 2 (1999), 127–190.Google ScholarDigital Library
Reuven Y Rubinstein. 2001. Combinatorial optimization, cross-entropy, ants and rare events. Stochastic Optimization: Algorithms and Applications 54 (2001), 303–363.Google ScholarCross Ref
Gerard Salton, Anita Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620. Google ScholarDigital Library
Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016).Google Scholar
Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In Proceedings of the 26th International Conference on Machine Learning (ICML). Google ScholarDigital Library
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958. Google ScholarDigital Library
Grigorios Tsoumakas and Ioannis Katakis. 2006. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2006).Google Scholar
Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word Representations: A Simple and General Method for Semisupervised Learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10). Association for Computational Linguistics, Stroudsburg, PA, USA, 384–394. http://dl.acm.org/citation.cfm?id=1858681. 1858721 Google ScholarDigital Library
Peter D Turney. 2006. Similarity of semantic relations. Computational Linguistics 32, 3 (2006), 379–416. Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000–6010. Google ScholarDigital Library
Martin T. Vechev and Eran Yahav. 2016. Programming with "Big Code". Foundations and Trends in Programming Languages 3, 4 (2016), 231–284. Google ScholarDigital Library
Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward Deep Learning Software Repositories. In Proceedings of the 12th Working Conference on Mining Software Repositories (MSR ’15). IEEE Press, Piscataway, NJ, USA, 334–345. http://dl.acm.org/citation.cfm?id=2820518.2820559 Google ScholarDigital Library
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning . 2048–2057. Google ScholarDigital Library
Meital Zilberstein and Eran Yahav. 2016. Leveraging a Corpus of Natural Language Descriptions for Program Similarity. In Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2016) . ACM, New York, NY, USA, 197–211. Google ScholarDigital Library

Index Terms

code2vec: learning distributed representations of code
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages

Recommendations

The adverse effects of code duplication in machine learning models of code
Onward! 2019: Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software

The field of big code relies on mining large corpora of code to perform some learning task towards creating better tools for software engineers. A significant threat to this approach was recently identified by Lopes et al. (2017) who found a large ...
Read More
Improvements to code2vec: Generating path vectors using RNN
Abstract
Source code analysis has many application scenarios, such as code plagiarism detection and software vulnerability search. Source code analysis can benefit from machine learning, but it typically requires a standard vector representation and ...
Read More
Predicting Program Properties from "Big Code"
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

We present a new approach for predicting program properties from massive codebases (aka "Big Code"). Our approach first learns a probabilistic model from existing data and then uses this model to predict properties of new, unseen programs.

The key idea ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 3, Issue POPL
January 2019
2275 pages
EISSN:2475-1421
DOI:10.1145/3302515
Issue’s Table of Contents

Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 January 2019
Published in pacmpl Volume 3, Issue POPL

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Reusable
Author Tags
Big Code
Distributed Representations
Machine Learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 638
  Total Citations
  View Citations
- 11,636
  Total Downloads
- Downloads (Last 12 months)1,810
- Downloads (Last 6 weeks)219
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

code2vec: learning distributed representations of code

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

The adverse effects of code duplication in machine learning models of code

Improvements to code2vec: Generating path vectors using RNN

Predicting Program Properties from "Big Code"