ABSTRACT
Binary code analysis is an enabling technique for many applications. Modern compilers and run-time libraries have introduced significant complexities to binary code, which negatively affect the capabilities of binary analysis tool kits to analyze binary code, and may cause tools to report inaccurate information about binary code. Analysts may hence be confused and applications based on these tool kits may have degrading quality. We examine the problem of constructing control flow graphs from binary code and labeling the graphs with accurate function boundary annotations. We identified several challenging code constructs that represent hard-to-analyze aspects of binary code, and show code examples for each code construct. As part of this discussion, we present new code parsing algorithms in our open source Dyninst tool kit that support these constructs, including a new model for describing jump tables that improves our ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls. We evaluated how various tool kits fare when handling these code constructs with real software as well as test binaries patterned after each challenging code construct we found in real software.
- L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCTOOLKIT: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience, 22(6):685–701, Apr. 2010. Google ScholarDigital Library
- D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, and M. Schulz. Stack trace analysis for large scale debugging. In 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1–10, Long Beach, California, USA, March 2007.Google ScholarCross Ref
- G. Balakrishnan, R. Gruian, T. Reps, and T. Teitelbaum. CodeSurfer/x86: A Platform for Analyzing x86 Executables. In 14th International Conference on Compiler Construction (CC), pages 250–254, Edinburgh, UK, 2005. Google ScholarDigital Library
- G. Balakrishnan and T. Reps. WYSINWYX: What You See is Not What You eXecute. ACM Transactions on Programming Languages and Systems, 32(6):23:1–23:84, Aug. 2010. Google ScholarDigital Library
- T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In 23rd USENIX Conference on Security Symposium (SEC), pages 845–860, San Diego, CA, Aug. 2014. Google ScholarDigital Library
- S. Bardin, P. Herrmann, and F. Védrine. Refinement-based cfg reconstruction from unstructured programs. In 12th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), pages 54–69, Austin, TX, USA, Jan. 2011. Google ScholarDigital Library
- A. R. Bernat and B. P. Miller. Anywhere, any-time binary instrumentation. In 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), pages 9–16, Szeged, Hungary, Sep. 2011. Google ScholarDigital Library
- A. R. Bernat and B. P. Miller. Structured Binary Editing with a CFG Transformation Algebra. In 2012 19th Working Conference on Reverse Engineering (WCRE), Kingston, Ontario, Canada, October 2012. Google ScholarDigital Library
- A. R. Bernat, K. Roundy, and B. P. Miller. Efficient, sensitivity resistant binary instrumentation. In the 2011 International Symposium on Software Testing and Analysis (ISSTA), pages 89–99, Toronto, Ontario, Canada, July 2011. Google ScholarDigital Library
- M. Bourquin, A. King, and E. Robbins. Binslayer: Accurate comparison of binary executables. In 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW), Rome, Italy, Jan. 2013. Google ScholarDigital Library
- D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. BAP: A Binary Analysis Platform. In 23rd International Conference on Computer Aided Verification (CAV), pages 463–469, Cliff Lodge, Snowbird, Utah, July 2011. Google ScholarDigital Library
- J. Caballero, N. M. Johnson, S. McCamant, and D. Song. Binary code extraction and interface identification for security applications. In 17th Network and Distributed System Security Symposium (NDSS), San Diego, California, USA, Feb. 2010.Google Scholar
- C. Cifuentes and M. Van Emmerik. Recovery of jump table case statements from binary code. In 7th International Workshop on Program Comprehension (IWPC), Pittsburgh, PA, USA, May 1999. IEEE Computer Society. Google ScholarDigital Library
- W. D. Clinger. Proper tail recursion and space efficiency. In 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 174–185, Montreal, Canada, June 1998. Google ScholarDigital Library
- ACM Press.Google Scholar
- C. ¸ Tăpu¸s, I.-H. Chung, and J. K. Hollingsworth. Active harmony: Towards automated performance tuning. In 2002 ACM/IEEE Conference on Supercomputing (SC), pages 1–11, Baltimore, Maryland, 2002. Google ScholarDigital Library
- B. De Sutter, B. De Bus, K. De Bosschere, P. Keyngnaert, and B. Demoen. On the static analysis of indirect control transfers in binaries. In 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, Nevada, USA, Jun. 2000.Google Scholar
- F. C. Eigler and Red Hat, Inc. Problem solving with SystemTap. In Proc. of the Ottawa Linux Symposium, Ottawa, Ontario, July 2006. Citeseer.Google Scholar
- K. ElWazeer, K. Anand, A. Kotha, M. Smithson, and R. Barua. Scalable variable and data type detection in a binary rewriter. In 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 51–60, Seattle, Washington, USA, 2013. ACM. Google ScholarDigital Library
- W. Fang, B. P. Miller, and J. A. Kupsch. Automated tracing and visualization of software security structure and properties. In Ninth International Symposium on Visualization for Cyber Security (VizSec), pages 9–16, Seattle, Washington, 2012. ACM. Google ScholarDigital Library
- GNU Project. GNU Binutils, http://www.gnu.org/software/binutils.Google Scholar
- L. C. Harris and B. P. Miller. Practical analysis of stripped binary code. ACM SIGARCH Computer Architecture News, 33(5):63–68, Dec. 2005. Google ScholarDigital Library
- Hex-Rays. IDA, https://www.hex-rays.com/products/ida/.Google Scholar
- E. R. Jacobson, A. R. Bernat, W. R. Williams, and B. P. Miller. Detecting code reuse attacks with a model of conformant program execution. In International Symposium on Engineering Secure Software and Systems (ESSoS), Munich, Germany, Feb. 2014. Google ScholarDigital Library
- E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE), Szeged, Hungary, Sep. 2011. Google ScholarDigital Library
- Jakstab. http://www.jakstab.org/home.Google Scholar
- D. Kästner and S. Wilhelm. Generic control flow reconstruction from assembly code. In Joint Conference on Languages, Compilers and Tools for Embedded Systems: Software and Compilers for Embedded Systems (LCTES/SCOPES), pages 46–55, Berlin, Germany, 2002. Google ScholarDigital Library
- W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, May 2013. Google ScholarDigital Library
- J. Kinder and D. Kravchenko. Alternating control flow reconstruction. In 13th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), Philadelphia, PA, Jan. 2012. Google ScholarDigital Library
- J. Kinder and H. Veith. Jakstab: A static analysis platform for binaries. In 20th International Conference on Computer Aided Verification (CAV), pages 423–427, Princeton, NJ, USA, July 2008. Google ScholarDigital Library
- G. Llort and H. Servat. Extrae. Barcelona Supercomputer Center, 2015.Google Scholar
- F. Long, S. Sidiroglou-Douskos, and M. Rinard. Automatic runtime error repair and containment via recovery shepherding. In 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Edinburgh, United Kingdom, June 2014. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 190–200, Chicago, IL, USA, June 2005. ACM. Google ScholarDigital Library
- B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. Computer, 28(11):37–46, Nov. 1995. Google ScholarDigital Library
- J. Mußler, D. Lorenz, and F. Wolf. Reducing the overhead of direct application instrumentation using prior static analysis. In Proceedings of the 17th international conference on Parallel processing-Volume Part I (Euro-Par 2011), Bordeaux, France, sep 2011. Springer-Verlag. Google ScholarDigital Library
- OllyDbg. http://www.ollydbg.de.Google Scholar
- P. O’Sullivan, K. Anand, A. Kotha, M. Smithson, R. Barua, and A. D. Keromytis. Retrofitting Security in COTS Software with Binary Rewriting. In 26th IFIP TC-11 International Information Security Conference (IFIP SEC), pages 154–172, Hamburg, Germany, June 2011.Google Scholar
- Paradyn Project. Dyninst: Putting the Performance in High Performance Computing, http://www.dyninst.org.Google Scholar
- F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su. X-force: Force-executing binary programs for security applications. In 23rd USENIX Conference on Security Symposium (SEC), San Diego, CA, Aug. 2014. Google ScholarDigital Library
- C. Reffett and D. Fleck. Securing applications with dyninst. In 2015 IEEE International Symposium on Technologies for Homeland Security (HST), pages 1–6, Waltham, MA, USA, April 2015.Google ScholarCross Ref
- T. Reinbacher and J. Brauer. Precise control flow reconstruction using boolean logic. In Ninth ACM International Conference on Embedded Software (EMSOFT), pages 117–126, Taipei, Taiwan, Oct. 2011. Google ScholarDigital Library
- N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the toolchain provenance of binary code. In 2011 International Symposium on Software Testing and Analysis (ISSTA), pages 100–110, Toronto, Ontario, Canada, July 2011. Google ScholarDigital Library
- N. Rosenblum, X. Zhu, and B. P. Miller. Who wrote this code? identifying the authors of program binaries. In 16th European Conference on Research in Computer Security (ESORICS), Leuven, Belgium, Sep. 2011. Google ScholarDigital Library
- N. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt. Learning to analyze binary computer code. In 23rd National Conference on Artificial Intelligence (AAAI), pages 798–804, Chicago, Illinois, July 2008. AAAI Press. Google ScholarDigital Library
- K. A. Roundy. Hybrid analysis and control of malicious code, doctoral disstertation, University of Wisconsin-Madison, 2012. Google ScholarDigital Library
- M. Schulz, J. Galarowicz, D. Maghrak, W. Hachfeld, D. Montoya, and S. Cranford. Open|SpeedShop: An open source infrastructure for parallel performance analysis. Scientific Programming, 16(2-3):105–121, 2008. Google ScholarDigital Library
- B. Schwarz, S. Debray, and G. Andrews. Disassembly of executable code revisited. In Ninth Working Conference on Reverse Engineering (WCRE), Richmond, VA, USA, Oct 2002. Google ScholarDigital Library
- M. Smithson, K. Elwazeer, K. Anand, A. Kotha, and R. Barua. Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness. In 20th Working Conference on Reverse Engineering WCRE, pages 52–61, Koblenz, Germany, October 2013.Google ScholarCross Ref
- B. D. Sutter, B. D. Bus, K. D. Bosschere, P. Keyngnaert, and B. Demoen. On the static analysis of indirect control transfers in binaries. In International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pages 1013–1019, Las Vegas, Nevada, USA, June 2000.Google Scholar
- H. Theiling. Extracting safe and precise control flow from binaries. In the Seventh International Conference on Real-Time Systems and Applications (RTCSA), pages 23–30, Cheju Island, South Korea, Dec. 2000. Google ScholarDigital Library
- V. van der Veen, D. Andriesse, E. Gökta¸s, B. Gras, L. Sambuc, A. Slowinska, H. Bos, and C. Giuffrida. Practical context-sensitive cfi. In 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS), Denver, Colorado, USA, Oct. 2015. Google ScholarDigital Library
- L. Xu, F. Sun, and Z. Su. Constructing precise control flow graphs from binaries. Technical report, Technical Report CSE-2009-27, Department of Computer Science, UC Davis., 2009.Google Scholar
- M. Zhang and R. Sekar. Control flow integrity for cots binaries. In 22nd USENIX Conference on Security (USENIX), pages 337–352, Washington, D.C., Aug. 2013. Google ScholarDigital Library
- J. Zhou and G. Vigna. Detecting attacks that exploit application-logic errors through application-level auditing. In 20th Annual Computer Security Applications Conference (ACSAC), pages 168–178, Tucson, AZ, USA, Dec. 2004. Google ScholarDigital Library
Index Terms
- Binary code is not easy
Recommendations
The poset structures admitting the extended binary Golay code to be a perfect code
Brualdi et al. [Codes with a poset metric, Discrete Math. 147 (1995) 57-72] introduced the concept of poset codes, and gave an example of poset structure which admits the extended binary Golay code to be a 4-error-correcting perfect P-code. In this ...
Detecting code clones in binary executables
ISSTA '09: Proceedings of the eighteenth international symposium on Software testing and analysisLarge software projects contain significant code duplication, mainly due to copying and pasting code. Many techniques have been developed to identify duplicated code to enable applications such as refactoring, detecting bugs, and protecting intellectual ...
The poset structures admitting the extended binary Hamming code to be a perfect code
Brualdi et al. introduced the concept of poset codes, and gave an example of poset structure which admits the extended binary Hamming code to be a double-error-correcting perfect P-code. Our study is motivated by this example. In this paper we classify ...
Comments