ABSTRACT
Dynamic Binary Translators (DBT) and Dynamic Binary Optimization (DBO) by software are used widely for several reasons including performance, design simplification and virtualization. However, the software layer in such systems introduces non-negligible overheads which affect performance and user experience. Hence, reducing DBT/DBO overheads is of paramount importance. In addition, reduced overheads have interesting collateral effects in the rest of the software layer, such as allowing optimizations to be applied earlier. A cost-effective solution to this problem is to provide hardware support to speed up the primitives of the software layer, paying special attention to automate DBT/DBO mechanisms and leave the heuristics to the software, which is more flexible. In this work, we have characterized the overheads of a DBO system using DynamoRIO implementing several basic optimizations. We have seen that the computation of the Data Dependence Graph (DDG) accounts for 5%-10% of the execution time. For this reason, we propose to add hardware support for this task in the form of a new functional unit, called DDGacc, which is integrated in a conventional pipeline processor and is operated through new ISA instructions. Our evaluation shows that DDGacc reduces the cost of computing the DDG by 32x, which reduces overall execution time by 5%-10% on average and up to 18% for applications where the DBO optimizes large code footprints.
- Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. URL http://www.spec.org/cpu2006/.Google Scholar
- T. Austin, E. Larson, and D. Ernst. Simplescalar: an infrastructure for computer system modeling. Computer, 35 (2): 59 --67, feb 2002. ISSN 0018--9162. 10.1109/2.982917. Google ScholarDigital Library
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, New York, NY, USA, 2000. ACM. ISBN 1--58113--199--2. http://doi.acm.org/10.1145/349299.349303. Google ScholarDigital Library
- L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. Ia-32 execution layer: a two-phase dynamic translator designed to support ia-32 applications on itanium®-based systems. In phMICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 191, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--2043-X. Google ScholarDigital Library
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 265--275, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. Google ScholarDigital Library
- J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 15--24, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. Google ScholarDigital Library
- K. Ebciouglu and E. R. Altman. Daisy: dynamic compilation for 100% architectural compatibility. SIGARCH Comput. Archit. News, 25 (2): 26--37, 1997. ISSN 0163--5964. http://doi.acm.org/10.1145/384286.264126. Google ScholarDigital Library
- K. Hazelwood and M. D. Smith. Managing bounded code caches in dynamic binary optimization systems. ACM Trans. Archit. Code Optim., 3: 263--294, September 2006. ISSN 1544--3566. http://doi.acm.org/10.1145/1162690.1162692. URL http://doi.acm.org/10.1145/1162690.1162692. Google ScholarDigital Library
- J. D. Hiser, D. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems. In CGO '07: Proceedings of the International Symposium on Code Generation and Optimization, pages 61--73, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0--7695--2764--7. http://dx.doi.org/10.1109/CGO.2007.10. Google ScholarDigital Library
- S. Hu and J. E. Smith. Reducing startup time in co-designed virtual machines. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture, pages 277--288, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0--7695--2608-X. http://dx.doi.org/10.1109/ISCA.2006.33. Google ScholarDigital Library
- A. Klaiber. The Technology Behind the Crusoe Processors. White paper, January 2000.Google Scholar
- T. Lindholm and F. Yellin. Java Virtual Machine Specification. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999. ISBN 0201432943. Google ScholarDigital Library
- J. Lu, H. Chen, R. Fu, W.-C. Hsu, B. Othmer, P.-C. Yew, and D.-Y. Chen. The performance of runtime data cache prefetching in a dynamic optimization system. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, pages 180--, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--2043-X. URL http://dl.acm.org/citation.cfm?id=956417.956549. Google ScholarDigital Library
- J. F. Martínez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. Cherry: checkpointed early resource recycling in out-of-order microprocessors. In MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 3--14, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. ISBN 0--7695--1859--1. Google ScholarDigital Library
- M. C. Merten, A. R. Trick, E. M. Nystrom, R. D. Barnes, and W.-m. W. Hmu. A hardware mechanism for dynamic extraction and relayout of program hot spots. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA '00, pages 59--70, New York, NY, USA, 2000. ACM. ISBN 1--58113--232--8. http://doi.acm.org/10.1145/339647.339655. URL http://doi.acm.org/10.1145/339647.339655. Google ScholarDigital Library
- S. S. Muchnick. phAdvanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. ISBN 1--55860--320--4. Google ScholarDigital Library
- S. Patel and S. Lumetta. rePLay: A hardware framework for dynamic optimization. Computers, IEEE Transactions on, 50 (6): 590--608, Jun 2001. ISSN 0018--9340. 10.1109/12.931895. Google ScholarDigital Library
- S. S. Paul, P. Ledak, J. Leblanc, S. Kosonocky, M. Gschwind, J. Fritts, A. Bright, E. Altman, and C. Agricola. Boa: Targeting multi-gigahertz with binary translation. In In Proc. of the 1999 Workshop on Binary Translation, IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pages 2--11, 1999.Google Scholar
- D. Pavlou, E. Gibert, F. Latorre, and A. Gonzalez. Improving Dynamic Binary Optimizers Efficiency through Specific Hardware Support. Technical Report UPC-DAC-RR-ARCO-2009--11, Universitat Politecnica de Catalunya, Department of Computer Architecture, September 2009.Google Scholar
- R. Rosner, Y. Almog, M. Moffie, N. Schwartz, and A. Mendelson. Power awareness through selective dynamically optimized traces. In Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on, pages 162--173, June 2004. 10.1109/ISCA.2004.1310772. Google ScholarDigital Library
- K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. Davidson, and M. L. Soffa. Retargetable and reconfigurable software dynamic translation. In CGO '03: Proceedings of the International Symposium on Code Generation and Optimization, pages 36--47, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. Google ScholarDigital Library
- J. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. ISBN 1558609105. Google ScholarDigital Library
- W. Srisa-an, M. B. Cohen, Y. Shang, and M. Soundararaj. A self-adjusting code cache manager to balance start-up time and memory usage. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO '10, pages 82--91, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--635--9. http://doi.acm.org/10.1145/1772954.1772968. URL http://doi.acm.org/10.1145/1772954.1772968. Google ScholarDigital Library
- S. Wilton and N. Jouppi. Cacti: an enhanced cache access and cycle time model. Solid-State Circuits, IEEE Journal of, 31 (5): 677--688, May 1996. ISSN 0018--9200. 10.1109/4.509850.Google Scholar
- W. Zhang, B. Calder, and D. M. Tullsen. An event-driven multithreaded dynamic optimization framework. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, PACT '05, pages 87--98, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0--7695--2429-X. http://dx.doi.org/10.1109/PACT.2005.7. URL http://dx.doi.org/10.1109/PACT.2005.7. Google ScholarDigital Library
Recommendations
DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support
VEE '12Dynamic Binary Translators (DBT) and Dynamic Binary Optimization (DBO) by software are used widely for several reasons including performance, design simplification and virtualization. However, the software layer in such systems introduces non-negligible ...
TAO: two-level atomicity for dynamic binary optimizations
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimizationDynamic binary translation is a key component of Hardware/Software (HW/SW) co-design, which is an enabling technology for processor microarchitecture innovation. There are two well-known dynamic binary optimization techniques based on atomic execution ...
Runtime Vectorization Transformations of Binary Code
In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, such as legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, ...
Comments