ABSTRACT
We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed, Shellshock and Venom. We show that Esh produces high accuracy results, with few to no false positives -- a crucial factor in the scenario of vulnerability search in stripped binaries.
- Clobberingtime: Cves, and a ffected products. http://www. kb.cert.org/vuls/id/852879.Google Scholar
- Gnu coreutils. http://www.gnu.org/software/ coreutils.Google Scholar
- Heartbleed vulnerability cve information. https: //cve.mitre.org/cgi-bin/cvename.cgi?name= CVE-2014-0160.Google Scholar
- Hex-rays IDAPRO. http://www.hex-rays.com.Google Scholar
- Smack: A bounded software verifier for c programs. https: //github.com/smackers/smack.Google Scholar
- Venom vulnerability cve information. http://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2015-3456.Google Scholar
- zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google Scholar
- zynamics bindi ff manual - understanding bindiff. www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google Scholar
- Aiken, A. Moss. https://theory.stanford.edu/ ~aiken/moss/.Google Scholar
- Barnett, M., Chang, B. E., DeLine, R., Jacobs, B., and Leino, K. R. M. Boogie: A modular reusable verifier for objectoriented programs. In Formal Methods for Components and Objects, 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures (2005), pp. 364–387. Google ScholarDigital Library
- Boiman, O., and Irani, M. Similarity by composition. In NIPS (2006), MIT Press, pp. 177–184.Google Scholar
- Brumley, D., Jager, I., Avgerinos, T., and Schwartz, E. J. Bap: A binary analysis platformIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 463–469. Google ScholarDigital Library
- David, Y., and Yahav, E. Tracelet-based code search in executablesIn Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014), PLDI ’14, ACM, pp. 349–360. Google ScholarDigital Library
- Egele, M., Woo, M., Chapman, P., and Brumley, D. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014. Google ScholarDigital Library
- (2014), pp. 303–317.Google Scholar
- Ferrante, J., Ottenstein, K. J., and Warren, J. D. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349. Google ScholarDigital Library
- Hawblitzel, C., Lahiri, S. K., Pawar, K., Hashmi, H., Gokbulut, S., Fernando, L., Detlefs, D., and Wadsworth, S. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013 (2013), pp. 191–201. Google ScholarDigital Library
- Jacobson, E. R., Rosenblum, N., and Miller, B. P. Labeling library functions in stripped binariesIn Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (2011), PASTE ’11, ACM, pp. 1–8. Google ScholarDigital Library
- Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: A search engine for binary codeIn Proceedings of the 10th Working Conference on Mining Software Repositories (2013), MSR ’13, IEEE Press, pp. 329–338. Google ScholarDigital Library
- Kleinbaum, D. G., and Klein, M. Analysis of Matched Data Using Logistic Regression. Springer, 2010.Google ScholarCross Ref
- Lahiri, S. K., Sinha, R., and Hawblitzel, C. Automatic rootcausing for program equivalence failures in binaries. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I (2015), pp. 362–379.Google Scholar
- Lattner, C., and Adve, V. Llvm: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on (2004), IEEE, pp. 75–86. Google ScholarDigital Library
- Leino, K. R. M. This is boogie 2. http://research. microsoft.com/en-us/um/people/leino/papers/ krml178.pdf.Google Scholar
- Ng, B. H., and Prakash, A. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual (July 2013), pp. 492–501. Google ScholarDigital Library
- Partush, N., and Yahav, E. Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, ch. Abstract Semantic Di fferencing for Numerical Programs, pp. 238–258.Google Scholar
- Partush, N., and Yahav, E. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014 (2014), pp. 811–828. Google ScholarDigital Library
- Pewny, J., Garmany, B., Gawlik, R., Rossow, C., and Holz, T. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015 (2015), pp. 709–724. Google ScholarDigital Library
- Pewny, J., Schuster, F., Bernhard, L., Holz, T., and Rossow, C. Leveraging semantic signatures for bug search in binary programsIn Proceedings of the 30th Annual Computer Security Applications Conference (2014), ACSAC ’14, ACM, pp. 406–415. Google ScholarDigital Library
- Ramos, D. A., and Engler, D. R. Practical, low-effort equivalence verification of real codeIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 669–685. Google ScholarDigital Library
- Rosenblum, N., Miller, B. P., and Zhu, X. Recovering the toolchain provenance of binary codeIn Proceedings of the 2011 International Symposium on Software Testing and Analysis (2011), ISSTA ’11, ACM, pp. 100–110. Google ScholarDigital Library
- Sæbjørnsen, A., Willcock, J., Panas, T., Quinlan, D. J., and Su, Z. Detecting code clones in binary executables. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, IL, USA, July 19-23, 2009 (2009), pp. 117–128. Google ScholarDigital Library
- Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Datadriven equivalence checkingIn Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (2013), OOPSLA ’13, ACM, pp. 391–406. Google ScholarDigital Library
- Smith, R., and Horwitz, S. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC) (2009).Google Scholar
- Swamidass, S. J., Azencott, C., Daily, K., and Baldi, P. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics 26, 10 (2010), 1348– 1356. Google ScholarDigital Library
- Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981. Google ScholarDigital Library
Index Terms
- Statistical similarity of binaries
Recommendations
Similarity of binaries through re-optimization
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with ...
Statistical similarity of binaries
PLDI '16We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled ...
Similarity of binaries through re-optimization
PLDI '17We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with ...
Comments