Abstract
A dynamic translator emulates an instruction set architccturc by translating source instructions to native code during execution. On statically-scheduled hardware, higher performance can potentially be achieved by reordering the translated instructions; however, this is a challenging transformation if the source architecture supports precise exception semantics, and the user-level program is allowed to register exception handlers. This paper presents a software technique which allows a translator to achieve the out-of-order execution of user-level programs, while preserving all sequential semantics. The design combines a translator, an interpreter, and a set of operating system services. Using the proposed techniques, a dynamic translator can optimistically reorder instructions and speculate them across branch boundaries. If a mispeculated operation causes an exception, the recovery algorithm reverts the application state to a safe point, then retranslates the faulty code without reordering to disable further exceptions.
- 1 R. Sites et al., "Binary Translation", Digital Technical Journal, Vol. 4, No. 4, 1992.Google Scholar
- 2 D. Papw()rth, "Tuning the Pentium Pro Microarchitecture", IEEE Micro, April 1996, pp 8-15. Google ScholarDigital Library
- 3 R. Cmelik, D. Keppel, "Shade: A Fast Instruction-Set Simulator for Execution Protiling", Sun Microsystems technical report UWCSE 93-06-06. Google ScholarDigital Library
- 4 TThompson, "Building the Virtual CPU", Byte Magazine, August 1995.Google Scholar
- 5 Urs Holzle et al., "Java on Steroids: Sun's High-Performance Java Implementation", slide set from the proceedings of Hot Chips IX, Stanford University, Aug. 25-26, 1997.Google Scholar
- 6 A.R. AdI-Tabatabai et al., "Fast, Effective Code Generation in a Just-in-Time Java Compiler", proc. of SIG- PLAN 98' Conf on Pr. Language Design and Impl., Jun. 17-19, 1998, ACM. Google ScholarDigital Library
- 7 K.J. McNeley and V. Milutinovic, "Emulating a Complex Instruction Set Computer with a Reduced Instruction Computer", IEEE Micro, Feb. 1987.Google ScholarDigital Library
- 8 A. Bergh et ai., "HP 3000 Emulation on HP Precision Architecture Computers", Hewlett-Packard Journal, Dec. 1987.Google Scholar
- 9 K. Andrews and D. Sand, "Migrating a CISC Computer Family onto RISC via Object Code Translation", Proc. 5th Annual Intl. Con f on Arch. Support for Prog. Lang. and Op. Systems, 1992, 213-222. Google ScholarDigital Library
- 10 T. Conte and S. Sathaye, "Dynamic Rescheduling: A Technique for Object Code Migration in VLIW Architectures'', Proceedings of the 28th International Symposium on Microarchitecture, 1995, 208-218. Google ScholarDigital Library
- 11 P. Chang et al., "IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors", Proc. of the 18th blternational Symposium on Computer Architecture, 188-198. Google ScholarDigital Library
- 12 W. Hwu cta!., "The Superblock: An Effective Technique for VLIW and Superscalar Compilation", The Journal of Supercomputing, 7, 229-248 (1993). Google ScholarDigital Library
- 13 W.Y. Chen, "Data Preload for Superscalar and VLIW Processors", Ph.D. Thesis, U. of Illinois, Urbana- Champaign, IL, 1993. Google ScholarDigital Library
- 14 R. Sites, "The Alpha AXP Architecture", Digital Technical Journal, Vol. 4, No. 4, 1992.Google Scholar
- 15 R. Colwell et al., "A VLIW architecture for a trace scheduling compiler", IEEE Transactions on Computers C-37, 8, Aug. 1988, 967-979. Google ScholarDigital Library
- 16 J. Turley, "Alpha Runs x86 Code with FX!32", Microprocessor Report, 3/5/96.Google Scholar
- 17 S. Mahlke et al., "Sentinel Scheduling for VLIW and Superscalar Processors", Proc. 5th Annual Intl. Conf. on Arch. Support for Prog. Lang. and Op. Systems, Oct. 1992, 238-247. Google ScholarDigital Library
- 18 K. Ebcioglu, E. R. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility", IBM research report RC 20538 (8/5/96).Google Scholar
- 19 B. Rau, "PlayDoh External Architecture Specification", technical report 93-80, Hewlett Packard Laboratories.Google Scholar
- 20 R. Sites, "The Alpha AXP Architecture", Digital Technical Journal, Vol. 4, No. 4, 1992.Google Scholar
- 21 L. Gwennap, "PA-8000 Combines Complexity and Speed", Microprocessor Report, November 14, 1996.Google Scholar
- 22 T. Conte et al., "A Persistent Rescheduled-Page Cache for Low Overhoad Object Code Compatibility in VLIW Architectures", proc. of the 29th International Sympositcm ofMicroarchitecture, 1996, pp. 4-13. Google ScholarDigital Library
- 23 G. Kane, "PA-RISC 2.0 architecture", Prentice-Hall, 1996. Google ScholarDigital Library
Index Terms
- An out-of-order execution technique for runtime binary translators
Recommendations
An out-of-order execution technique for runtime binary translators
A dynamic translator emulates an instruction set architccturc by translating source instructions to native code during execution. On statically-scheduled hardware, higher performance can potentially be achieved by reordering the translated instructions; ...
An out-of-order execution technique for runtime binary translators
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systemsA dynamic translator emulates an instruction set architccturc by translating source instructions to native code during execution. On statically-scheduled hardware, higher performance can potentially be achieved by reordering the translated instructions; ...
Optimizing Indirect Branches in Dynamic Binary Translators
Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect branches. ...
Comments