skip to main content
10.1145/2931037.2931047acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Public Access

Binary code is not easy

Published:18 July 2016Publication History

ABSTRACT

Binary code analysis is an enabling technique for many applications. Modern compilers and run-time libraries have introduced significant complexities to binary code, which negatively affect the capabilities of binary analysis tool kits to analyze binary code, and may cause tools to report inaccurate information about binary code. Analysts may hence be confused and applications based on these tool kits may have degrading quality. We examine the problem of constructing control flow graphs from binary code and labeling the graphs with accurate function boundary annotations. We identified several challenging code constructs that represent hard-to-analyze aspects of binary code, and show code examples for each code construct. As part of this discussion, we present new code parsing algorithms in our open source Dyninst tool kit that support these constructs, including a new model for describing jump tables that improves our ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls. We evaluated how various tool kits fare when handling these code constructs with real software as well as test binaries patterned after each challenging code construct we found in real software.

References

  1. L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCTOOLKIT: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience, 22(6):685–701, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, and M. Schulz. Stack trace analysis for large scale debugging. In 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1–10, Long Beach, California, USA, March 2007.Google ScholarGoogle ScholarCross RefCross Ref
  3. G. Balakrishnan, R. Gruian, T. Reps, and T. Teitelbaum. CodeSurfer/x86: A Platform for Analyzing x86 Executables. In 14th International Conference on Compiler Construction (CC), pages 250–254, Edinburgh, UK, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Balakrishnan and T. Reps. WYSINWYX: What You See is Not What You eXecute. ACM Transactions on Programming Languages and Systems, 32(6):23:1–23:84, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In 23rd USENIX Conference on Security Symposium (SEC), pages 845–860, San Diego, CA, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Bardin, P. Herrmann, and F. Védrine. Refinement-based cfg reconstruction from unstructured programs. In 12th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), pages 54–69, Austin, TX, USA, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. R. Bernat and B. P. Miller. Anywhere, any-time binary instrumentation. In 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), pages 9–16, Szeged, Hungary, Sep. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. R. Bernat and B. P. Miller. Structured Binary Editing with a CFG Transformation Algebra. In 2012 19th Working Conference on Reverse Engineering (WCRE), Kingston, Ontario, Canada, October 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. R. Bernat, K. Roundy, and B. P. Miller. Efficient, sensitivity resistant binary instrumentation. In the 2011 International Symposium on Software Testing and Analysis (ISSTA), pages 89–99, Toronto, Ontario, Canada, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Bourquin, A. King, and E. Robbins. Binslayer: Accurate comparison of binary executables. In 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW), Rome, Italy, Jan. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. BAP: A Binary Analysis Platform. In 23rd International Conference on Computer Aided Verification (CAV), pages 463–469, Cliff Lodge, Snowbird, Utah, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Caballero, N. M. Johnson, S. McCamant, and D. Song. Binary code extraction and interface identification for security applications. In 17th Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, Feb. 2010.Google ScholarGoogle Scholar
  13. C. Cifuentes and M. Van Emmerik. Recovery of jump table case statements from binary code. In 7th International Workshop on Program Comprehension (IWPC), Pittsburgh, PA, USA, May 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. D. Clinger. Proper tail recursion and space efficiency. In 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 174–185, Montreal, Canada, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ACM Press.Google ScholarGoogle Scholar
  16. C. ¸ Tăpu¸s, I.-H. Chung, and J. K. Hollingsworth. Active harmony: Towards automated performance tuning. In 2002 ACM/IEEE Conference on Supercomputing (SC), pages 1–11, Baltimore, Maryland, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. De Sutter, B. De Bus, K. De Bosschere, P. Keyngnaert, and B. Demoen. On the static analysis of indirect control transfers in binaries. In 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, Nevada, USA, Jun. 2000.Google ScholarGoogle Scholar
  18. F. C. Eigler and Red Hat, Inc. Problem solving with SystemTap. In Proc. of the Ottawa Linux Symposium, Ottawa, Ontario, July 2006. Citeseer.Google ScholarGoogle Scholar
  19. K. ElWazeer, K. Anand, A. Kotha, M. Smithson, and R. Barua. Scalable variable and data type detection in a binary rewriter. In 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 51–60, Seattle, Washington, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Fang, B. P. Miller, and J. A. Kupsch. Automated tracing and visualization of software security structure and properties. In Ninth International Symposium on Visualization for Cyber Security (VizSec), pages 9–16, Seattle, Washington, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. GNU Project. GNU Binutils, http://www.gnu.org/software/binutils.Google ScholarGoogle Scholar
  22. L. C. Harris and B. P. Miller. Practical analysis of stripped binary code. ACM SIGARCH Computer Architecture News, 33(5):63–68, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hex-Rays. IDA, https://www.hex-rays.com/products/ida/.Google ScholarGoogle Scholar
  24. E. R. Jacobson, A. R. Bernat, W. R. Williams, and B. P. Miller. Detecting code reuse attacks with a model of conformant program execution. In International Symposium on Engineering Secure Software and Systems (ESSoS), Munich, Germany, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), Szeged, Hungary, Sep. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jakstab. http://www.jakstab.org/home.Google ScholarGoogle Scholar
  27. D. Kästner and S. Wilhelm. Generic control flow reconstruction from assembly code. In Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES), pages 46–55, Berlin, Germany, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Kinder and D. Kravchenko. Alternating control flow reconstruction. In 13th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), Philadelphia, PA, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Kinder and H. Veith. Jakstab: A static analysis platform for binaries. In 20th International Conference on Computer Aided Verification (CAV), pages 423–427, Princeton, NJ, USA, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Llort and H. Servat. Extrae. Barcelona Supercomputer Center, 2015.Google ScholarGoogle Scholar
  32. F. Long, S. Sidiroglou-Douskos, and M. Rinard. Automatic runtime error repair and containment via recovery shepherding. In 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Edinburgh, United Kingdom, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 190–200, Chicago, IL, USA, June 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. Computer, 28(11):37–46, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Mußler, D. Lorenz, and F. Wolf. Reducing the overhead of direct application instrumentation using prior static analysis. In Proceedings of the 17th international conference on Parallel processing-Volume Part I (Euro-Par 2011), Bordeaux, France, sep 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. OllyDbg. http://www.ollydbg.de.Google ScholarGoogle Scholar
  37. P. O’Sullivan, K. Anand, A. Kotha, M. Smithson, R. Barua, and A. D. Keromytis. Retrofitting Security in COTS Software with Binary Rewriting. In 26th IFIP TC-11 International Information Security Conference (IFIP SEC), pages 154–172, Hamburg, Germany, June 2011.Google ScholarGoogle Scholar
  38. Paradyn Project. Dyninst: Putting the Performance in High Performance Computing, http://www.dyninst.org.Google ScholarGoogle Scholar
  39. F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su. X-force: Force-executing binary programs for security applications. In 23rd USENIX Conference on Security Symposium (SEC), San Diego, CA, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Reffett and D. Fleck. Securing applications with dyninst. In 2015 IEEE International Symposium on Technologies for Homeland Security (HST), pages 1–6, Waltham, MA, USA, April 2015.Google ScholarGoogle ScholarCross RefCross Ref
  41. T. Reinbacher and J. Brauer. Precise control flow reconstruction using boolean logic. In Ninth ACM International Conference on Embedded Software (EMSOFT), pages 117–126, Taipei, Taiwan, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the toolchain provenance of binary code. In 2011 International Symposium on Software Testing and Analysis (ISSTA), pages 100–110, Toronto, Ontario, Canada, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. N. Rosenblum, X. Zhu, and B. P. Miller. Who wrote this code? identifying the authors of program binaries. In 16th European Conference on Research in Computer Security (ESORICS), Leuven, Belgium, Sep. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. N. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt. Learning to analyze binary computer code. In 23rd National Conference on Artificial Intelligence (AAAI), pages 798–804, Chicago, Illinois, July 2008. AAAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. A. Roundy. Hybrid analysis and control of malicious code, doctoral disstertation, University of Wisconsin-Madison, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Schulz, J. Galarowicz, D. Maghrak, W. Hachfeld, D. Montoya, and S. Cranford. Open|SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 16(2-3):105–121, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. Schwarz, S. Debray, and G. Andrews. Disassembly of executable code revisited. In Ninth Working Conference on Reverse Engineering (WCRE), Richmond, VA, USA, Oct 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Smithson, K. Elwazeer, K. Anand, A. Kotha, and R. Barua. Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness. In 20th Working Conference on Reverse Engineering WCRE, pages 52–61, Koblenz, Germany, October 2013.Google ScholarGoogle ScholarCross RefCross Ref
  49. B. D. Sutter, B. D. Bus, K. D. Bosschere, P. Keyngnaert, and B. Demoen. On the static analysis of indirect control transfers in binaries. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pages 1013–1019, Las Vegas, Nevada, USA, June 2000.Google ScholarGoogle Scholar
  50. H. Theiling. Extracting safe and precise control flow from binaries. In the Seventh International Conference on Real-Time Systems and Applications (RTCSA), pages 23–30, Cheju Island, South Korea, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. V. van der Veen, D. Andriesse, E. Gökta¸s, B. Gras, L. Sambuc, A. Slowinska, H. Bos, and C. Giuffrida. Practical context-sensitive cfi. In 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS), Denver, Colorado, USA, Oct. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. Xu, F. Sun, and Z. Su. Constructing precise control flow graphs from binaries. Technical report, Technical Report CSE-2009-27, Department of Computer Science, UC Davis., 2009.Google ScholarGoogle Scholar
  53. M. Zhang and R. Sekar. Control flow integrity for cots binaries. In 22nd USENIX Conference on Security (USENIX), pages 337–352, Washington, D.C., Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Zhou and G. Vigna. Detecting attacks that exploit application-logic errors through application-level auditing. In 20th Annual Computer Security Applications Conference (ACSAC), pages 168–178, Tucson, AZ, USA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Binary code is not easy

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISSTA 2016: Proceedings of the 25th International Symposium on Software Testing and Analysis
        July 2016
        452 pages
        ISBN:9781450343909
        DOI:10.1145/2931037

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate58of213submissions,27%

        Upcoming Conference

        ISSTA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader