skip to main content
research-article
Public Access

A large-scale study of programming languages and code quality in GitHub

Published:25 September 2017Publication History
Skip Abstract Section

Abstract

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (728 projects, 63 million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static versus dynamic typing and allowing versus disallowing type confusion on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion.

References

  1. Bhattacharya, P., Neamtiu, I. Assessing programming language impact on development and maintenance: A study on C and C++. In Proceedings of the 33rd International Conference on Software Engineering, ICSE'11 (New York, NY USA, 2011). ACM, 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P. Don't touch my code! Examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (2011). ACM, 4--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blei, D.M. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cohen, J. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum, 2003.Google ScholarGoogle Scholar
  5. Easterbrook, S., Singer, J., Storey, M.-A., Damian, D. Selecting empirical methods for software engineering research. In Guide to Advanced Empirical Software Engineering (2008). Springer, 285--311. Google ScholarGoogle ScholarCross RefCross Ref
  6. El Emam, K., Benlarbi, S., Goel, N., Rai, S.N. The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans. Softw. Eng. 27, 7 (2001), 630--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hanenberg, S. An experiment about static and dynamic type systems: Doubts about the positive impact of static type systems on development time. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'10 (New York, NY, USA, 2010). ACM, 22--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Harrison, R., Smaraweera, L., Dobie, M., Lewis, P. Comparing programming paradigms: An evaluation of functional and object-oriented programs. Softw. Eng. J. 11, 4 (1996), 247--254. Google ScholarGoogle ScholarCross RefCross Ref
  9. Harter, D.E., Krishnan, M.S., Slaughter, S.A. Effects of process maturity on quality, cycle time, and effort in software product development. Manage. Sci. 46 4 (2000), 451--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hindley, R. The principal type-scheme of an object in combinatory logic. Trans. Am. Math. Soc. (1969), 29--60. Google ScholarGoogle ScholarCross RefCross Ref
  11. Jump, M., McKinley, K.S. Cork: Dynamic memory leak detection for garbage-collected languages. In ACM SIGPLAN Notices, Volume 42 (2007). ACM, 31--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kleinschmager, S., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A. Do static type systems improve the maintainability of software systems? An empirical study. In 2012 IEEE 20th International Conference on Program Comprehension (ICPC) (2012). IEEE, 153--162. Google ScholarGoogle ScholarCross RefCross Ref
  13. Li, Z., Tan, L., Wang, X., Lu, S., Zhou, Y., Zhai, C. Have things changed now? An empirical study of bug characteristics in modern open source software. In ASID'06: Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (October 2006).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marques De Sá, J.P. Applied Statistics Using SPSS, Statistica and Matlab, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  15. Mayer, C., Hanenberg, S., Robbes, R., Tanter, É., Stefik, A. An empirical study of the influence of static type systems on the usability of undocumented software. In ACM SIGPLAN Notices, Volume 47 (2012). ACM, 683--702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Meyerovich, L.A., Rabkin, A.S. Empirical analysis of programming language adoption. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (2013). ACM, 1--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Milner, R. A theory of type polymorphism in programming. J. Comput. Syst. Sci. 17, 3 (1978), 348--375. Google ScholarGoogle ScholarCross RefCross Ref
  18. Mockus, A., Votta, L.G. Identifying reasons for software changes using historic databases. In ICSM'00. Proceedings of the International Conference on Software Maintenance (2000). IEEE Computer Society, 120. Google ScholarGoogle ScholarCross RefCross Ref
  19. Odersky, M., Spoon, L., Venners, B. Programming in Scala. Artima Inc, 2008.Google ScholarGoogle Scholar
  20. Pankratius, V., Schmidt, F., Garretón, G. Combining functional and imperative programming for multicore software: An empirical study evaluating scala and java. In Proceedings of the 2012 International Conference on Software Engineering (2012). IEEE Press, 123--133. Google ScholarGoogle ScholarCross RefCross Ref
  21. Petricek, T., Skeet, J. Real World Functional Programming: With Examples in F# and C#. Manning Publications Co., 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pierce, B.C. Types and Programming Languages. MIT Press, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Posnett, D., Bird, C., Dévanbu, P. An empirical study on the influence of pattern roles on change-proneness. Emp. Softw. Eng. 16, 3 (2011), 396--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tan, L., Liu, C., Li, Z., Wang, X., Zhou, Y., Zhai, C. Bug characteristics in open source software. Emp. Softw. Eng. (2013).Google ScholarGoogle Scholar

Index Terms

  1. A large-scale study of programming languages and code quality in GitHub

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Communications of the ACM
          Communications of the ACM  Volume 60, Issue 10
          October 2017
          90 pages
          ISSN:0001-0782
          EISSN:1557-7317
          DOI:10.1145/3144574
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 September 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format