skip to main content
10.1145/2908080.2908126acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Statistical similarity of binaries

Published:02 June 2016Publication History

ABSTRACT

We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed, Shellshock and Venom. We show that Esh produces high accuracy results, with few to no false positives -- a crucial factor in the scenario of vulnerability search in stripped binaries.

References

  1. Clobberingtime: Cves, and a ffected products. http://www. kb.cert.org/vuls/id/852879.Google ScholarGoogle Scholar
  2. Gnu coreutils. http://www.gnu.org/software/ coreutils.Google ScholarGoogle Scholar
  3. Heartbleed vulnerability cve information. https: //cve.mitre.org/cgi-bin/cvename.cgi?name= CVE-2014-0160.Google ScholarGoogle Scholar
  4. Hex-rays IDAPRO. http://www.hex-rays.com.Google ScholarGoogle Scholar
  5. Smack: A bounded software verifier for c programs. https: //github.com/smackers/smack.Google ScholarGoogle Scholar
  6. Venom vulnerability cve information. http://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2015-3456.Google ScholarGoogle Scholar
  7. zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google ScholarGoogle Scholar
  8. zynamics bindi ff manual - understanding bindiff. www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google ScholarGoogle Scholar
  9. Aiken, A. Moss. https://theory.stanford.edu/ ~aiken/moss/.Google ScholarGoogle Scholar
  10. Barnett, M., Chang, B. E., DeLine, R., Jacobs, B., and Leino, K. R. M. Boogie: A modular reusable verifier for objectoriented programs. In Formal Methods for Components and Objects, 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures (2005), pp. 364–387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Boiman, O., and Irani, M. Similarity by composition. In NIPS (2006), MIT Press, pp. 177–184.Google ScholarGoogle Scholar
  12. Brumley, D., Jager, I., Avgerinos, T., and Schwartz, E. J. Bap: A binary analysis platformIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 463–469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David, Y., and Yahav, E. Tracelet-based code search in executablesIn Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014), PLDI ’14, ACM, pp. 349–360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Egele, M., Woo, M., Chapman, P., and Brumley, D. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. (2014), pp. 303–317.Google ScholarGoogle Scholar
  16. Ferrante, J., Ottenstein, K. J., and Warren, J. D. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hawblitzel, C., Lahiri, S. K., Pawar, K., Hashmi, H., Gokbulut, S., Fernando, L., Detlefs, D., and Wadsworth, S. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013 (2013), pp. 191–201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jacobson, E. R., Rosenblum, N., and Miller, B. P. Labeling library functions in stripped binariesIn Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (2011), PASTE ’11, ACM, pp. 1–8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: A search engine for binary codeIn Proceedings of the 10th Working Conference on Mining Software Repositories (2013), MSR ’13, IEEE Press, pp. 329–338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kleinbaum, D. G., and Klein, M. Analysis of Matched Data Using Logistic Regression. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. Lahiri, S. K., Sinha, R., and Hawblitzel, C. Automatic rootcausing for program equivalence failures in binaries. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I (2015), pp. 362–379.Google ScholarGoogle Scholar
  22. Lattner, C., and Adve, V. Llvm: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on (2004), IEEE, pp. 75–86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leino, K. R. M. This is boogie 2. http://research. microsoft.com/en-us/um/people/leino/papers/ krml178.pdf.Google ScholarGoogle Scholar
  24. Ng, B. H., and Prakash, A. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual (July 2013), pp. 492–501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Partush, N., and Yahav, E. Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, ch. Abstract Semantic Di fferencing for Numerical Programs, pp. 238–258.Google ScholarGoogle Scholar
  26. Partush, N., and Yahav, E. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014 (2014), pp. 811–828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., and Holz, T. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015 (2015), pp. 709–724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pewny, J., Schuster, F., Bernhard, L., Holz, T., and Rossow, C. Leveraging semantic signatures for bug search in binary programsIn Proceedings of the 30th Annual Computer Security Applications Conference (2014), ACSAC ’14, ACM, pp. 406–415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ramos, D. A., and Engler, D. R. Practical, low-effort equivalence verification of real codeIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 669–685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rosenblum, N., Miller, B. P., and Zhu, X. Recovering the toolchain provenance of binary codeIn Proceedings of the 2011 International Symposium on Software Testing and Analysis (2011), ISSTA ’11, ACM, pp. 100–110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sæbjørnsen, A., Willcock, J., Panas, T., Quinlan, D. J., and Su, Z. Detecting code clones in binary executables. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, IL, USA, July 19-23, 2009 (2009), pp. 117–128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Datadriven equivalence checkingIn Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (2013), OOPSLA ’13, ACM, pp. 391–406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Smith, R., and Horwitz, S. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC) (2009).Google ScholarGoogle Scholar
  34. Swamidass, S. J., Azencott, C., Daily, K., and Baldi, P. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics 26, 10 (2010), 1348– 1356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Statistical similarity of binaries

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2016
            726 pages
            ISBN:9781450342612
            DOI:10.1145/2908080
            • General Chair:
            • Chandra Krintz,
            • Program Chair:
            • Emery Berger
            • cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 51, Issue 6
              PLDI '16
              June 2016
              726 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/2980983
              • Editor:
              • Andy Gill
              Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 June 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate406of2,067submissions,20%

            Upcoming Conference

            PLDI '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader