skip to main content
10.1145/2786805.2786849acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Suggesting accurate method and class names

Published:30 August 2015Publication History

ABSTRACT

Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, suggesting names for methods and classes is much more difficult. This is because good method and class names need to be functionally descriptive, but suggesting such names requires that the model goes beyond local context. We introduce a neural probabilistic language model for source code that is specifically designed for the method naming problem. Our model learns which names are semantically similar by assigning them to locations, called embeddings, in a high-dimensional continuous space, in such a way that names with similar embeddings tend to be used in similar contexts. These embeddings seem to contain semantic information about tokens, even though they are learned only from statistical co-occurrences of tokens. Furthermore, we introduce a variant of our model that is, to our knowledge, the first that can propose neologisms, names that have not appeared in the training corpus. We obtain state of the art results on the method, class, and even the simpler variable naming tasks. More broadly, the continuous embeddings that are learned by our model have the potential for wide application within software engineering.

References

  1. S. L. Abebe, V. Arnaoudova, P. Tonella, G. Antoniol, and Y. Gueheneuc. Can lexicon bad smells improve fault prediction? In WCRE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Allamanis and C. Sutton. Mining source code repositories at massive scale using language modeling. In MSR. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Allamanis and C. Sutton. Mining Idioms from Source Code. In FSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Arnaoudova, M. Di Penta, G. Antoniol, and Y.-G. Gueheneuc. A new family of software anti-patterns: Linguistic anti-patterns. In CSMR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Arnaoudova, L. M. Eshkevari, M. D. Penta, R. Oliveto, G. Antoniol, and Y. Guéhéneuc. REPENT: analyzing the nature of identifier renamings. IEEE TSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Arnaoudova, M. D. Penta, and G. Antoniol. Linguistic antipatterns: What they are and how developers perceive them. EMSE, 2015.Google ScholarGoogle Scholar
  8. M. Banko, V. Mittal, and M. Witbrock. Headline generation based on statistical translation. In ACL, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Beck. Implementation patterns. Pearson Education, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Binkley, M. Hearn, and D. Lawrie. Improving identifier informativeness using part of speech information. In MSR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Botha and P. Blunsom. Compositional morphology for word representations and language modelling. In ICML, 2014.Google ScholarGoogle Scholar
  13. S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Relating identifier naming flaws and code quality: An empirical study. In WCRE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Exploring the influence of identifier names on code quality: An empirical study. In 14th European Conference on Software Maintenance and Reengineering (CSMR’2010, pages 156–165, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Mining Java class naming conventions. In ICSM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In ACL, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. A. Corbi. Program understanding: Challenge for the 1990s. IBM Systems Journal, 28(2):294–306, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Using IR methods for labeling source code artifacts: Is it worthwhile? In ICPC, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  19. B. Dorr, D. Zajic, and R. Schwartz. Hedge trimmer: A parse-and-trim approach to headline generation. In HLT-NAACL-03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. P. Eddy, J. A. Robinson, N. A. Kraft, and J. C. Carver. Evaluating source code summarization techniques: Replication and expansion. In ICPC, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. U. Gutmann and A. Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Haiduc, J. Aponte, and A. Marcus. Supporting program comprehension with source code summarization. In ICSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Haiduc, J. Aponte, L. Moreno, and A. Marcus. On the use of automated text summarization techniques for summarizing source code. In WCRE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Hendrix, J. Cross, S. Maghsoodloo, et al. The effectiveness of control structure diagrams in source code comprehension activities. IEEE TSE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In ICSE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. W. Høst and B. M. Østvold. Debugging method names. In ECOOP, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Karaivanov, V. Raychev, and M. T. Vechev. Phrase-based statistical translation of programming languages. In Onward!, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In NIPS, 2013.Google ScholarGoogle Scholar
  29. D. Lawrie, C. Morrell, H. Feild, and D. Binkley. What’s in a name? a study of identifiers. In ICPC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Liblit, A. Begel, and E. Sweetser. Cognitive perspectives on the role of naming in computer programs. In Proceedings of the 18th Annual Psychology of Programming Workshop, 2006.Google ScholarGoogle Scholar
  31. C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In ICML, 2014.Google ScholarGoogle Scholar
  32. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  33. R. C. Martin. Clean code: a handbook of agile software craftsmanship. Pearson Education, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. McConnell. Code Complete. Microsoft Press, 2004.Google ScholarGoogle Scholar
  35. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR Workshop, 2013.Google ScholarGoogle Scholar
  36. T. Mikolov, W.-t. Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In HLT-NAACL, 2013.Google ScholarGoogle Scholar
  37. A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In ICML, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Mnih and K. Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS, 2013.Google ScholarGoogle Scholar
  39. A. Mnih and Y. W. Teh. A fast and simple algorithm for training neural probabilistic language models. In ICML, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Mou, G. Li, Z. Jin, L. Zhang, and T. Wang. TBCNN: a tree-based convolutional neural network for programming language processing. arXiv preprint arXiv:1409.5718, 2014.Google ScholarGoogle Scholar
  41. T. T. Nguyen, A. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. A statistical semantic language model for source code. In FSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. EMNLP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  43. V. Raychev, M. Vechev, and A. Krause. Predicting program properties from “big code”. In POPL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Sridhara. Automatic generation of descriptive summary comments for methods in object-oriented programs. University of Delaware, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for java methods. In ASE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. G. Sridhara, L. Pollock, and K. Vijay-Shanker. Automatically detecting and describing high level actions within methods. In ICSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Takang, P. Grubb, and R. Macredie. The effects of comments and identifier names on program comprehensibility: an experiential study. Journal of Program Languages, 4(3):143–167, 1996.Google ScholarGoogle Scholar
  50. A. A. Takang, P. A. Grubb, and R. D. Macredie. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143–167, 1996.Google ScholarGoogle Scholar
  51. L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Suggesting accurate method and class names

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering
      August 2015
      1068 pages
      ISBN:9781450336758
      DOI:10.1145/2786805

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader