ABSTRACT
Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, suggesting names for methods and classes is much more difficult. This is because good method and class names need to be functionally descriptive, but suggesting such names requires that the model goes beyond local context. We introduce a neural probabilistic language model for source code that is specifically designed for the method naming problem. Our model learns which names are semantically similar by assigning them to locations, called embeddings, in a high-dimensional continuous space, in such a way that names with similar embeddings tend to be used in similar contexts. These embeddings seem to contain semantic information about tokens, even though they are learned only from statistical co-occurrences of tokens. Furthermore, we introduce a variant of our model that is, to our knowledge, the first that can propose neologisms, names that have not appeared in the training corpus. We obtain state of the art results on the method, class, and even the simpler variable naming tasks. More broadly, the continuous embeddings that are learned by our model have the potential for wide application within software engineering.
- S. L. Abebe, V. Arnaoudova, P. Tonella, G. Antoniol, and Y. Gueheneuc. Can lexicon bad smells improve fault prediction? In WCRE, 2012. Google ScholarDigital Library
- M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarDigital Library
- M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In MSR. IEEE Press, 2013. Google ScholarDigital Library
- M. Allamanis and C. Sutton. Mining Idioms from Source Code. In FSE, 2014. Google ScholarDigital Library
- V. Arnaoudova, M. Di Penta, G. Antoniol, and Y.-G. Gueheneuc. A new family of software anti-patterns: Linguistic anti-patterns. In CSMR, 2013. Google ScholarDigital Library
- V. Arnaoudova, L. M. Eshkevari, M. D. Penta, R. Oliveto, G. Antoniol, and Y. Guéhéneuc. REPENT: analyzing the nature of identifier renamings. IEEE TSE, 2014. Google ScholarDigital Library
- V. Arnaoudova, M. D. Penta, and G. Antoniol. Linguistic antipatterns: What they are and how developers perceive them. EMSE, 2015.Google Scholar
- M. Banko, V. Mittal, and M. Witbrock. Headline generation based on statistical translation. In ACL, 2000. Google ScholarDigital Library
- K. Beck. Implementation patterns. Pearson Education, 2007. Google ScholarDigital Library
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 2003. Google ScholarDigital Library
- D. Binkley, M. Hearn, and D. Lawrie. Improving identifier informativeness using part of speech information. In MSR, 2011. Google ScholarDigital Library
- J. Botha and P. Blunsom. Compositional morphology for word representations and language modelling. In ICML, 2014.Google Scholar
- S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Relating identifier naming flaws and code quality: An empirical study. In WCRE, 2009. Google ScholarDigital Library
- S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Exploring the influence of identifier names on code quality: An empirical study. In 14th European Conference on Software Maintenance and Reengineering (CSMR’2010, pages 156–165, 2010. Google ScholarDigital Library
- S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Mining Java class naming conventions. In ICSM, 2011. Google ScholarDigital Library
- S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In ACL, 1996. Google ScholarDigital Library
- T. A. Corbi. Program understanding: Challenge for the 1990s. IBM Systems Journal, 28(2):294–306, 1989. Google ScholarDigital Library
- A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Using IR methods for labeling source code artifacts: Is it worthwhile? In ICPC, 2012.Google ScholarCross Ref
- B. Dorr, D. Zajic, and R. Schwartz. Hedge trimmer: A parse-and-trim approach to headline generation. In HLT-NAACL-03, 2003. Google ScholarDigital Library
- B. P. Eddy, J. A. Robinson, N. A. Kraft, and J. C. Carver. Evaluating source code summarization techniques: Replication and expansion. In ICPC, 2013.Google ScholarCross Ref
- M. U. Gutmann and A. Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research, 2012. Google ScholarDigital Library
- S. Haiduc, J. Aponte, and A. Marcus. Supporting program comprehension with source code summarization. In ICSE, 2010. Google ScholarDigital Library
- S. Haiduc, J. Aponte, L. Moreno, and A. Marcus. On the use of automated text summarization techniques for summarizing source code. In WCRE, 2010. Google ScholarDigital Library
- D. Hendrix, J. Cross, S. Maghsoodloo, et al. The effectiveness of control structure diagrams in source code comprehension activities. IEEE TSE, 2002. Google ScholarDigital Library
- A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In ICSE, 2012. Google ScholarDigital Library
- E. W. Høst and B. M. Østvold. Debugging method names. In ECOOP, 2009.Google ScholarDigital Library
- S. Karaivanov, V. Raychev, and M. T. Vechev. Phrase-based statistical translation of programming languages. In Onward!, 2014. Google ScholarDigital Library
- R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In NIPS, 2013.Google Scholar
- D. Lawrie, C. Morrell, H. Feild, and D. Binkley. What’s in a name? a study of identifiers. In ICPC, 2006. Google ScholarDigital Library
- B. Liblit, A. Begel, and E. Sweetser. Cognitive perspectives on the role of naming in computer programs. In Proceedings of the 18th Annual Psychology of Programming Workshop, 2006.Google Scholar
- C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In ICML, 2014.Google Scholar
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
- R. C. Martin. Clean code: a handbook of agile software craftsmanship. Pearson Education, 2008. Google ScholarDigital Library
- S. McConnell. Code Complete. Microsoft Press, 2004.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR Workshop, 2013.Google Scholar
- T. Mikolov, W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In HLT-NAACL, 2013.Google Scholar
- A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In ICML, 2007. Google ScholarDigital Library
- A. Mnih and K. Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS, 2013.Google Scholar
- A. Mnih and Y. W. Teh. A fast and simple algorithm for training neural probabilistic language models. In ICML, 2012.Google ScholarDigital Library
- L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang. TBCNN: a tree-based convolutional neural network for programming language processing. arXiv preprint arXiv:1409.5718, 2014.Google Scholar
- T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. A statistical semantic language model for source code. In FSE, 2013. Google ScholarDigital Library
- J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. EMNLP, 2014.Google ScholarCross Ref
- V. Raychev, M. Vechev, and A. Krause. Predicting program properties from “big code”. In POPL, 2015. Google ScholarDigital Library
- S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. Google ScholarDigital Library
- G. Sridhara. Automatic generation of descriptive summary comments for methods in object-oriented programs. University of Delaware, 2012.Google ScholarDigital Library
- G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for java methods. In ASE, 2010. Google ScholarDigital Library
- G. Sridhara, L. Pollock, and K. Vijay-Shanker. Automatically detecting and describing high level actions within methods. In ICSE, 2011. Google ScholarDigital Library
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014. Google ScholarDigital Library
- A. Takang, P. Grubb, and R. Macredie. The effects of comments and identifier names on program comprehensibility: an experiential study. Journal of Program Languages, 4(3):143–167, 1996.Google Scholar
- A. A. Takang, P. A. Grubb, and R. D. Macredie. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143–167, 1996.Google Scholar
- L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008.Google Scholar
Index Terms
- Suggesting accurate method and class names
Recommendations
Suggesting natural method names to check name consistencies
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringMisleading names of the methods in a project or the APIs in a software library confuse developers about program functionality and API usages, leading to API misuses and defects. In this paper, we introduce MNire, a machine learning approach to check the ...
Debugging Method Names
Genoa: Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented ProgrammingMeaningful method names are crucial for the readability and maintainability of software. Existing naming conventions focus on syntactic details, leaving programmers with little or no support in assuring meaningful names. In this paper, we show that ...
A Context-based Automated Approach for Method Name Consistency Checking and Suggestion
ICSE '21: Proceedings of the 43rd International Conference on Software EngineeringMisleading method names in software projects can confuse developers, which may lead to software defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach to detect method name inconsistencies ...
Comments