research-article

Suggesting accurate method and class names

Authors:
Miltiadis Allamanis

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

,
Earl T. Barr

University College London, UK

University College London, UK
View Profile

,
Christian Bird

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Charles Sutton

University of Edinburgh, UK

University of Edinburgh, UK
View Profile

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software EngineeringAugust 2015Pages 38–49https://doi.org/10.1145/2786805.2786849

Published:30 August 2015Publication History

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Pages 38–49

ABSTRACT

Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, suggesting names for methods and classes is much more difficult. This is because good method and class names need to be functionally descriptive, but suggesting such names requires that the model goes beyond local context. We introduce a neural probabilistic language model for source code that is specifically designed for the method naming problem. Our model learns which names are semantically similar by assigning them to locations, called embeddings, in a high-dimensional continuous space, in such a way that names with similar embeddings tend to be used in similar contexts. These embeddings seem to contain semantic information about tokens, even though they are learned only from statistical co-occurrences of tokens. Furthermore, we introduce a variant of our model that is, to our knowledge, the first that can propose neologisms, names that have not appeared in the training corpus. We obtain state of the art results on the method, class, and even the simpler variable naming tasks. More broadly, the continuous embeddings that are learned by our model have the potential for wide application within software engineering.

References

S. L. Abebe, V. Arnaoudova, P. Tonella, G. Antoniol, and Y. Gueheneuc. Can lexicon bad smells improve fault prediction? In WCRE, 2012. Google ScholarDigital Library
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarDigital Library
M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In MSR. IEEE Press, 2013. Google ScholarDigital Library
M. Allamanis and C. Sutton. Mining Idioms from Source Code. In FSE, 2014. Google ScholarDigital Library
V. Arnaoudova, M. Di Penta, G. Antoniol, and Y.-G. Gueheneuc. A new family of software anti-patterns: Linguistic anti-patterns. In CSMR, 2013. Google ScholarDigital Library
V. Arnaoudova, L. M. Eshkevari, M. D. Penta, R. Oliveto, G. Antoniol, and Y. Guéhéneuc. REPENT: analyzing the nature of identifier renamings. IEEE TSE, 2014. Google ScholarDigital Library
V. Arnaoudova, M. D. Penta, and G. Antoniol. Linguistic antipatterns: What they are and how developers perceive them. EMSE, 2015.Google Scholar
M. Banko, V. Mittal, and M. Witbrock. Headline generation based on statistical translation. In ACL, 2000. Google ScholarDigital Library
K. Beck. Implementation patterns. Pearson Education, 2007. Google ScholarDigital Library
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 2003. Google ScholarDigital Library
D. Binkley, M. Hearn, and D. Lawrie. Improving identifier informativeness using part of speech information. In MSR, 2011. Google ScholarDigital Library
J. Botha and P. Blunsom. Compositional morphology for word representations and language modelling. In ICML, 2014.Google Scholar
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Relating identifier naming flaws and code quality: An empirical study. In WCRE, 2009. Google ScholarDigital Library
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Exploring the influence of identifier names on code quality: An empirical study. In 14th European Conference on Software Maintenance and Reengineering (CSMR’2010, pages 156–165, 2010. Google ScholarDigital Library
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Mining Java class naming conventions. In ICSM, 2011. Google ScholarDigital Library
S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In ACL, 1996. Google ScholarDigital Library
T. A. Corbi. Program understanding: Challenge for the 1990s. IBM Systems Journal, 28(2):294–306, 1989. Google ScholarDigital Library
A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Using IR methods for labeling source code artifacts: Is it worthwhile? In ICPC, 2012.Google ScholarCross Ref
B. Dorr, D. Zajic, and R. Schwartz. Hedge trimmer: A parse-and-trim approach to headline generation. In HLT-NAACL-03, 2003. Google ScholarDigital Library
B. P. Eddy, J. A. Robinson, N. A. Kraft, and J. C. Carver. Evaluating source code summarization techniques: Replication and expansion. In ICPC, 2013.Google ScholarCross Ref
M. U. Gutmann and A. Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research, 2012. Google ScholarDigital Library
S. Haiduc, J. Aponte, and A. Marcus. Supporting program comprehension with source code summarization. In ICSE, 2010. Google ScholarDigital Library
S. Haiduc, J. Aponte, L. Moreno, and A. Marcus. On the use of automated text summarization techniques for summarizing source code. In WCRE, 2010. Google ScholarDigital Library
D. Hendrix, J. Cross, S. Maghsoodloo, et al. The effectiveness of control structure diagrams in source code comprehension activities. IEEE TSE, 2002. Google ScholarDigital Library
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In ICSE, 2012. Google ScholarDigital Library
E. W. Høst and B. M. Østvold. Debugging method names. In ECOOP, 2009.Google ScholarDigital Library
S. Karaivanov, V. Raychev, and M. T. Vechev. Phrase-based statistical translation of programming languages. In Onward!, 2014. Google ScholarDigital Library
R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In NIPS, 2013.Google Scholar
D. Lawrie, C. Morrell, H. Feild, and D. Binkley. What’s in a name? a study of identifiers. In ICPC, 2006. Google ScholarDigital Library
B. Liblit, A. Begel, and E. Sweetser. Cognitive perspectives on the role of naming in computer programs. In Proceedings of the 18th Annual Psychology of Programming Workshop, 2006.Google Scholar
C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In ICML, 2014.Google Scholar
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
R. C. Martin. Clean code: a handbook of agile software craftsmanship. Pearson Education, 2008. Google ScholarDigital Library
S. McConnell. Code Complete. Microsoft Press, 2004.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR Workshop, 2013.Google Scholar
T. Mikolov, W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In HLT-NAACL, 2013.Google Scholar
A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In ICML, 2007. Google ScholarDigital Library
A. Mnih and K. Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS, 2013.Google Scholar
A. Mnih and Y. W. Teh. A fast and simple algorithm for training neural probabilistic language models. In ICML, 2012.Google ScholarDigital Library
L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang. TBCNN: a tree-based convolutional neural network for programming language processing. arXiv preprint arXiv:1409.5718, 2014.Google Scholar
T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. A statistical semantic language model for source code. In FSE, 2013. Google ScholarDigital Library
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. EMNLP, 2014.Google ScholarCross Ref
V. Raychev, M. Vechev, and A. Krause. Predicting program properties from “big code”. In POPL, 2015. Google ScholarDigital Library
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. Google ScholarDigital Library
G. Sridhara. Automatic generation of descriptive summary comments for methods in object-oriented programs. University of Delaware, 2012.Google ScholarDigital Library
G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for java methods. In ASE, 2010. Google ScholarDigital Library
G. Sridhara, L. Pollock, and K. Vijay-Shanker. Automatically detecting and describing high level actions within methods. In ICSE, 2011. Google ScholarDigital Library
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014. Google ScholarDigital Library
A. Takang, P. Grubb, and R. Macredie. The effects of comments and identifier names on program comprehensibility: an experiential study. Journal of Program Languages, 4(3):143–167, 1996.Google Scholar
A. A. Takang, P. A. Grubb, and R. D. Macredie. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143–167, 1996.Google Scholar
L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008.Google Scholar

Index Terms

Suggesting accurate method and class names
1. Software and its engineering
  1. Software notations and tools

Recommendations

Suggesting natural method names to check name consistencies
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Misleading names of the methods in a project or the APIs in a software library confuse developers about program functionality and API usages, leading to API misuses and defects. In this paper, we introduce MNire, a machine learning approach to check the ...
Read More
Debugging Method Names
Genoa: Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented Programming

Meaningful method names are crucial for the readability and maintainability of software. Existing naming conventions focus on syntactic details, leaving programmers with little or no support in assuring meaningful names. In this paper, we show that ...
Read More
A Context-based Automated Approach for Method Name Consistency Checking and Suggestion
ICSE '21: Proceedings of the 43rd International Conference on Software Engineering

Misleading method names in software projects can confuse developers, which may lead to software defects and affect code understandability. In this paper, we present DeepName, a context-based, deep learning approach to detect method name inconsistencies ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering
August 2015
1068 pages
ISBN:9781450336758
DOI:10.1145/2786805
General Chair:
Elisabetta Di Nitto
Politecnico di Milano, Italy
,
Program Chairs:
Mark Harman
University College London, UK
,
Patrick Heymans
University of Namur, Belgium
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Coding conventions
naturalness of software
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 1,755
  Total Downloads
- Downloads (Last 12 months)137
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Suggesting accurate method and class names

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Suggesting natural method names to check name consistencies

Debugging Method Names

A Context-based Automated Approach for Method Name Consistency Checking and Suggestion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Suggesting accurate method and class names

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Suggesting natural method names to check name consistencies

Debugging Method Names

A Context-based Automated Approach for Method Name Consistency Checking and Suggestion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media