skip to main content
research-article
Open Access

Processor-Tracing Guided Region Formation in Dynamic Binary Translation

Authors Info & Claims
Published:16 November 2018Publication History
Skip Abstract Section

Abstract

Region formation is an important step in dynamic binary translation to select hot code regions for translation and optimization. The quality of the formed regions determines the extent of optimizations and thus determines the final execution performance. Moreover, the overall performance is very sensitive to the formation overhead, because region formation can have a non-trivial cost. For addressing the dual issues of region quality and region formation overhead, this article presents a lightweight region formation method guided by processor tracing, e.g., Intel PT. We leverage the branch history information stored in the processor to reconstruct the program execution profile and effectively form high-quality regions with low cost. Furthermore, we present the designs of lightweight hardware performance monitoring sampling and the branch instruction decode cache to minimize region formation overhead. Using ARM64 to x86-64 translations, the experiment results show that our method achieves a performance speedup of up to 1.53× (1.16× on average) for SPEC CPU2006 benchmarks with reference inputs, compared to the well-known software-based trace formation method, Next Executing Tail (NET). The performance results of x86-64 to ARM64 translations also show a speedup of up to 1.25× over NET for CINT2006 benchmarks with reference inputs. The comparison with a relaxed NETPlus region formation method further demonstrates that our method achieves the best performance and lowest compilation overhead.

References

  1. B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. 2000. The jalapeñO virtual machine. IBM Syst. J. 39, 1 (Jan. 2000), 211--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM. 2012. CoreSight Components Technical Reference Manual. ARM.Google ScholarGoogle Scholar
  3. Matthew Arnold and Barbara G. Ryder. 2001. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation. 168--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Thomas Ball and James R. Larus. 1994. Optimally profiling and tracing programs. ACM Trans. Program. Lang. Syst. 16, 4 (Jul. 1994), 1319--1360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Thomas Ball and James R. Larus. 1996. Efficient path profiling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leonid Baraz, Tevi Devor, Orna Etzion, Shalom Goldenberg, Alex Skaletsky, Yun Wang, and Yigel Zemach. 2003. IA-32 execution layer: A two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference. 41--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Igor Böhm, Tobias J. K. Edler von Koch, Stephen C. Kyle, Björn Franke, and Nigel Topham. 2011. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Edson Borin, Youfeng Wu, Cheng Wang, Wei Liu, Mauricio Breternitz, Jr., Shiliang Hu, Esfir Natanzon, Shai Rotem, and Roni Rosner. 2010. TAO: Two-level atomicity for dynamic binary optimizations. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code Generation and Optimization. 265--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dries Buytaert, Andy Georges, Michael Hind, Matthew Arnold, Lieven Eeckhout, and Koen De Bosschere. 2007. Using Hpm-sampling to drive dynamic compilation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications. 553--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. G. Castanos, H. Hayashizaki, H. Inoue, M. J. Serrano, and P. Wu. 2014. Adaptive next-executing-cycle trace selection for trace-driven code optimizers. http://www.google.com/patents/US8756581 US Patent 8,756,581.Google ScholarGoogle Scholar
  14. Wen-Ke Chen, Sorin Lerner, Ronnie Chaiken, and David M. Gillies. 2000. Mojo: A dynamic optimization system. In ACM Workshop on Feedback-Directed and Dynamic Optimization. 81--90.Google ScholarGoogle Scholar
  15. Amanieu D’Antras, Cosmin Gorgovan, Jim Garside, and Mikel Luján. 2017. Low overhead dynamic binary translation on ARM. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 333--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Derek M. Davis and Kim Hazelwood. 2011. Improving region selection through loop completion. In Proceedings of the ASPLOS Workshop on Runtime Environments/Systems, Layering, and Virtualized Environments.Google ScholarGoogle Scholar
  17. James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The transmeta code morphing™ software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization. 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Evelyn Duesterwald and Vasanth Bala. 2000. Software profiling for hot path prediction: Less is more. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. 202--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendorff, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang, and Michael Franz. 2009. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the ACM Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Susan L. Graham, Peter B. Kessler, and Marshall K. Mckusick. 1982. Gprof: A call graph execution profiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction. 120--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hiroshige Hayashizaki, Peng Wu, Hiroshi Inoue, Mauricio J. Serrano, and Toshio Nakatani. 2011. Improving the performance of trace-based systems by false loop filtering. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems. 405--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. David Hiniker, Kim Hazelwood, and Michael D. Smith. 2005. Improving region selection in dynamic optimization systems. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. 141--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin Hirzel and Trishul Chilimbi. 2001. Bursty tracing: A framework for low-overhead temporal profiling. In Proceedings of the 4th ACM Workshop on Feedback-Directed and Dynamic Optimization.Google ScholarGoogle Scholar
  24. Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu, Yeh-Ching Chung, Pangfeng Liu, and Chien-Min Wang. 2012. HQEMU: A multi-threaded and retargetable dynamic binary translator on multicores. In Proceedings of the International Symposium on Code Generation and Optimization. 104--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chun-Chen Hsu, Pangfeng Liu, Jan-Jan Wu, Pen-Chung Yew, Ding-Yong Hong, Wei-Chung Hsu, and Chien-Min Wang. 2013. Improving dynamic binary optimization through early-exit guided code region formation. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu, and Toshio Nakatani. 2011. A trace-based Java JIT compiler retrofitted from a method-based compiler. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 246--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Intel Corporation 2018. Intel(R) 64 and IA-32 Architectures Software Developer’s Manual: Volume 3. Intel Corporation.Google ScholarGoogle Scholar
  28. Daniel Jones and Nigel Topham. 2009. High speed CPU simulation using LTU dynamic binary translation. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers. 50--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the International Symposium on Code Generation and Optimization. 75--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Linaro. 2018. OpenCSD library. Retrieved from https://github.com/Linaro/OpenCSD.Google ScholarGoogle Scholar
  31. Linaro ToolChain. 2017. Linaro ARM GCC toolchain. Retrieved from http://www.linaro.org/downloads/.Google ScholarGoogle Scholar
  32. Jiwei Lu, Howard Chen, Pen-Chung Yew, and Wei-Chung Hsu. 2004. Design and implementation of a lightweight dynamic optimization system. J. Instruct.-Level Parall. 6 (2004), 1--24.Google ScholarGoogle Scholar
  33. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nicholas Nethercote and Julian Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Andreas Neustifter. 2010. Efficient Profiling in the LLVM Compiler. Master’s thesis. Vienna University of Technology.Google ScholarGoogle Scholar
  36. Vijay Sundaresan, Daryl Maier, Pramod Ramarao, and Mark Stoodley. 2006. Experiences with multi-threading and dynamic class loading in a java just-in-time compiler. In Proceedings of the International Symposium on Code Generation and Optimization. 87--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. David Tam and John Wu. 2003. Using Hardware Counters to Improve Dynamic Compilation. Technical Report.Google ScholarGoogle Scholar
  38. Mustafa M. Tikir and Jeffrey K. Hollingsworth. 2002. Efficient instrumentation for code coverage testing. In Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis. 86--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Cheng Wang, Shiliang Hu, Ho-seop Kim, Sreekumar R. Nair, Mauricio Breternitz, Zhiwei Ying, and Youfeng Wu. 2007. StarDBT: An efficient multi-platform dynamic binary translation system. In Proceedings of the Asia-Pacific Conference on Advances in Computer Systems Architecture. 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Wang, B. Zheng, H. S. Kim, M. Breternitz, and Y. Wu. 2010. Two-pass MRET trace selection for dynamic optimization. http://www.google.com/patents/US7694281 US Patent 7,694,281.Google ScholarGoogle Scholar
  41. John Whaley. 2000. A portable sampling-based profiler for java virtual machines. In Proceedings of the ACM 2000 Conference on Java Grande. 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Peng Wu, Hiroshige Hayashizaki, Hiroshi Inoue, and Toshio Nakatani. 2011. Reducing trace selection footprint for large-scale java applications without performance loss. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications. 789--804. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Processor-Tracing Guided Region Formation in Dynamic Binary Translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 15, Issue 4
          December 2018
          706 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/3284745
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 November 2018
          • Accepted: 1 September 2018
          • Revised: 1 August 2018
          • Received: 1 June 2018
          Published in taco Volume 15, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format