Abstract
This paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip coprocessor, with its own mini-instruction set, that operates on the core processor's instructions to transform them into an internal format that can be more efficiently executed. It is located off the critical path of the core processor to ensure that it does not negatively impact the core processor's cycle time or pipeline depth. An I-COP is highly versatile and can be used to implement different types of instruction transformations to enhance the IPC of the core processor. We study four potential applications of the I-COP to demonstrate the feasibility of this concept and investigate the design issues of such a coprocessor. A prototype instruction set for the I-COP is presented along with an implementation framework that facilitates achieving high I-COP performance. Initial results indicate that the I-COP is able to efficiently implement the trace cache fill unit as well as the register move, stride data prefetching and linked data structure prefetching trace optimizations.
- 1 Michael Slater, "AMD's K5 Designed to Outrun Pentium," in Microprocessor Report, Vol. 8, Issue 14, Oct 1994.Google Scholar
- 2 Linley Gwennap, "Intel's P6 Uses Decoupled Superscalar Design," in Microprocessor Report, Vol 9, Issue 2, Feb 1995.Google Scholar
- 3 E. Rotenberg, S. Bennett and J. Smith, "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," in Proc. of 29th Int. Symp. on Microarchitecture, 1996. Google ScholarDigital Library
- 4 S. Patel, D. Friendly and Y. Patt, "Critical Issues Regarding the Trace Cache Fetch Mechanism," Technical Report CSE- TR-335-97, University of Michigan, May 1997.Google Scholar
- 5 B. Black, B. Rychlik and J. Shen, "The Block-based Trace Cache," in Proc. of 26th Int. Syrup. on Computer Architecture, May 1999. Google ScholarDigital Library
- 6 E. Debaere and J. Campenhout, "Interpretation and Instruction Path Coprocessing," MIT Press, 1990. Google ScholarDigital Library
- 7 A. Chemoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. Yadavalli, J. Yates, "FX!32 - A profile-directed binary translator," IEEE MICRO, 18(2), March-April 1998. Google ScholarDigital Library
- 8 D. Friendly, S. Patel and Y. Patt, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors," in Proc. of 31st Int. Symp. on Microarchitecture, December 1998. Google ScholarDigital Library
- 9 Q. Jacobson and J. Smith, "Instruction Pre-Processing in Trace Processors," in Proc. of 5th Int. Symp, on High Performance Computer Architecture, 1999. Google ScholarDigital Library
- 10 Alpha Architecture Handbook, Digital Equipment Corporation, 1992.Google Scholar
- 11 Microprocessor Report, 5/11/98.Google Scholar
- 12 Keith Dieffendorf, "Katmai Enhances MMX," Microprocessor Report, 10/5/98.Google Scholar
- 13 A. Srivastava and A. Eustace, "ATOM: A System for Building Customized Program Analysis Tools," in Proc. of SIGPLAN Conf. on Programming Language Design and Implementation, June 1994. Google ScholarDigital Library
- 14 R. Nair and M. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups," in Proc. of 24th Int. Syrup. on Computer Architecture, June 1997. Google ScholarDigital Library
- 15 M. Franklin and M. Smotherman, "A Fill-Unit Approach to Multiple Instruction Issue," in Proc. of 27th Int. Syrup. on Microarchitecture, December 1994. Google ScholarDigital Library
- 16 E Rotenberg and J. Smith, "Control Independence in Trace Processors," in Proc. of 32nd Int. Symp. on Microarchitecture, December 1999. Google ScholarDigital Library
- 17 T. Kistler, "Dynamic Runtime Optimization," in Proc. of the Joint Modular Languages Conference, 1997. Google ScholarDigital Library
- 18 R. Chappell, J. Stark, S. Kim and Y. Patt, "Simultaneous Subordinate Microthreading (SSMT)," in Proc. of 26th Int. Symp. on Computer Architecture, May 1999. Google ScholarDigital Library
- 19 Y. Song and M. Dubois, "Assisted Execution," Technical Report #CENG 98-25, Department of EE-Systems, University of Southern California, October 1998.Google Scholar
- 20 K. Ebcioglu and E. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility," in Proc. of 24th Int. Symp. on Computer Architecture, June 1997. Google ScholarDigital Library
- 21 M. Schuette, "Exploitation of Instruction-Level Parallelism for Detection of Processor Execution Errors," Ph.D. Thesis, ECE Department, Carnegie Mellon University, 1991. Google ScholarDigital Library
- 22 T. Chen and J. Baer, "Effective Hardware-Based Data Prefetching for High-Performance Processors," IEEE Transactions on Computers, Vol. 44, No. 5, 1995. Google ScholarDigital Library
- 23 D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors," in Proc. of 24th Int. Syrup. on Computer Architecture, June 1997. Google ScholarDigital Library
- 24 T. Mowry, "Tolerating Latency Through Software- Controlled Data Prefetching," Ph.D. Thesis, Stanford University, 1994. Google ScholarDigital Library
- 25 C. Luk and T. Mowry, "Compiler-Based Prefetching for Recursive Data Structures," in Proc. of 7th ASPLOS, 1996. Google ScholarDigital Library
- 26 A. Roth, A. Moshovos and G. Sohi, "Dependence Based Prefetching for Linked Data Structures," in Proc. of 8th ASPLOS, October 1998. Google ScholarDigital Library
- 27 A. Roth and G. Sohi, "Effective Jump-Pointer Prefetching for Linked Data Structures," in Proc. of 26th Int. Syrup. on Computer Architecture, May 1999. Google ScholarDigital Library
- 28 http://www.spec.orgGoogle Scholar
- 29 A. Rogers, M. Carlisle, J. Reppy and L. Hendren, "Supporting Dynamic Data Structures on Distributed Memory Machines," ACM Transactions on Programming Languages and Systems, 17(2), March 1995. Google ScholarDigital Library
- 30 R. Hank, W. Hwu and B. Rau, "Region-based Compilation: An Introduction and Motivation," in Proc. of 28th Int. Syrup. on Microarchitecture, December 1995. Google ScholarDigital Library
- 31 Y. Chou and J. Shen, "Instruction Path Coprocessors", CMuART Tech. Report, Carnegie Mellon Univ., March 2000.Google Scholar
- 32 R. Rakvic, B. Black, and J. Shen, "Completion Time Multiple Branch Prediction for Enhancing Trace Cache Performance," in Proc. of 27th Int. Syrup. on Computer Architecture, June 2000. Google ScholarDigital Library
Index Terms
- Instruction path coprocessors
Recommendations
Instruction path coprocessors
ISCA '00: Proceedings of the 27th annual international symposium on Computer architectureThis paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip coprocessor, with its own mini-instruction set, that operates on the core processor's instructions to transform them into an internal format that ...
Instruction Recycling on a Multiple-Path Processor
HPCA '99: Proceedings of the 5th International Symposium on High Performance Computer ArchitectureProcessors that can simultaneously execute multiple paths of execution will only exacerbate the fetch bandwidth problem already plaguing conventional processors. On a multiple-path processor, which speculatively executes less likely paths of hard-to-...
Comments