skip to main content
research-article
Free Access

Predicting program properties from 'big code'

Published:21 February 2019Publication History
Skip Abstract Section

Abstract

We present a new approach for predicting program properties from large codebases (aka "Big Code"). Our approach learns a probabilistic model from "Big Code" and uses this model to predict properties of new, unseen programs.

The key idea of our work is to transform the program into a representation that allows us to formulate the problem of inferring program properties as structured prediction in machine learning. This enables us to leverage powerful probabilistic models such as Conditional Random Fields (CRFs) and perform joint prediction of program properties.

As an example of our approach, we built a scalable prediction engine called JSNICE for solving two kinds of tasks in the context of JavaScript: predicting (syntactic) names of identifiers and predicting (semantic) type annotations of variables. Experimentally, JSNICE predicts correct names for 63% of name identifiers and its type annotation predictions are correct in 81% of cases. Since its public release at http://jsnice.org, JSNice has become a popular system with hundreds of thousands of uses.

By formulating the problem of inferring program properties as structured prediction, our work opens up the possibility for a range of new "Big Code" applications such as de-obfuscators, decompilers, invariant generators, and others.

References

  1. Annotating javascript. https://github.com/google/closure-compiler/wiki/Annotating-JavaScript-for-the-Closure-Compiler.Google ScholarGoogle Scholar
  2. Bitbucket. https://bitbucket.org/.Google ScholarGoogle Scholar
  3. Facebook flow. https://github.com/facebook/flow.Google ScholarGoogle Scholar
  4. Github. http://github.com/.Google ScholarGoogle Scholar
  5. Google closure compiler. https://developers.google.com/closure/compiler/.Google ScholarGoogle Scholar
  6. Shrink your code and resources. ProGuard for Android Applications: https://developer.android.com/studio/build/shrink-code.html.Google ScholarGoogle Scholar
  7. Typescript. https://www.typescriptlang.org/.Google ScholarGoogle Scholar
  8. Uglifyjs. https://github.com/mishoo/UglifyJS.Google ScholarGoogle Scholar
  9. Bichsel, B., Raychev, V., Tsankov, P., Vechev, M. Statistical deobfuscation of android applications. CCS 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bielik, P., Raychev, V., Vechev, M.T. PHOG: probabilistic model for code. In Proceedings of the 33<sup>nd</sup> International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 (2016), pp. 2933--2942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DARPA. Mining and understanding software enclaves (muse). http://www.darpa.mil/news-events/2014-03-06a (2014).Google ScholarGoogle Scholar
  12. He, X., Zemel, R.S., Carreira-Perpiñán, M.A. Multiscale conditional random fields for image labeling. CVPR 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jensen, S.H., Møller, A., Thiemann, P. Type analysis for javascript. In Proceedings of the 16<sup>th</sup> International Symposium on Static Analysis, SAS 2009 (Berlin, Heidelberg, 2009), Springer-Verlag, pp. 238--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Koller, D., Friedman, N. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, Cambridge, Massachusetts and London, England, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lafferty, J.D., McCallum, A., Pereira, F.C.N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML 2001 (San Francisco, CA, USA, 2001), pp. 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Quattoni, A., Collins, M., Darrell, T. Conditional random fields for object recognition. In NIPS (2004), 1097--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ratliff, N.D., Bagnell, J.A., Zinkevich, M. (Approximate) subgradient methods for structured prediction. In AISTATS (2007), 380--387.Google ScholarGoogle Scholar
  18. Raychev, V. Learning from Large Codebases. PhD dissertation, ETH Zurich, 2016.Google ScholarGoogle Scholar
  19. Vechev, M., Yahav, E. Programming with "big code". Foundations and Trends in Programming Languages 3, 4 (2016), 231--284. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predicting program properties from 'big code'

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Communications of the ACM
      Communications of the ACM  Volume 62, Issue 3
      March 2019
      109 pages
      ISSN:0001-0782
      EISSN:1557-7317
      DOI:10.1145/3314328
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 February 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format