Instruction path coprocessors

Authors:
Yuan Chou

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
View Profile

,
John Paul Shen

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 28 Issue 2May 2000pp 270–281https://doi.org/10.1145/342001.339694

Published:01 May 2000Publication History

ACM SIGARCH Computer Architecture News

Abstract

This paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip coprocessor, with its own mini-instruction set, that operates on the core processor's instructions to transform them into an internal format that can be more efficiently executed. It is located off the critical path of the core processor to ensure that it does not negatively impact the core processor's cycle time or pipeline depth. An I-COP is highly versatile and can be used to implement different types of instruction transformations to enhance the IPC of the core processor. We study four potential applications of the I-COP to demonstrate the feasibility of this concept and investigate the design issues of such a coprocessor. A prototype instruction set for the I-COP is presented along with an implementation framework that facilitates achieving high I-COP performance. Initial results indicate that the I-COP is able to efficiently implement the trace cache fill unit as well as the register move, stride data prefetching and linked data structure prefetching trace optimizations.

References

1 Michael Slater, "AMD's K5 Designed to Outrun Pentium," in Microprocessor Report, Vol. 8, Issue 14, Oct 1994.Google Scholar
2 Linley Gwennap, "Intel's P6 Uses Decoupled Superscalar Design," in Microprocessor Report, Vol 9, Issue 2, Feb 1995.Google Scholar
3 E. Rotenberg, S. Bennett and J. Smith, "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," in Proc. of 29th Int. Symp. on Microarchitecture, 1996. Google ScholarDigital Library
4 S. Patel, D. Friendly and Y. Patt, "Critical Issues Regarding the Trace Cache Fetch Mechanism," Technical Report CSE- TR-335-97, University of Michigan, May 1997.Google Scholar
5 B. Black, B. Rychlik and J. Shen, "The Block-based Trace Cache," in Proc. of 26th Int. Syrup. on Computer Architecture, May 1999. Google ScholarDigital Library
6 E. Debaere and J. Campenhout, "Interpretation and Instruction Path Coprocessing," MIT Press, 1990. Google ScholarDigital Library
7 A. Chemoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. Yadavalli, J. Yates, "FX!32 - A profile-directed binary translator," IEEE MICRO, 18(2), March-April 1998. Google ScholarDigital Library
8 D. Friendly, S. Patel and Y. Patt, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors," in Proc. of 31st Int. Symp. on Microarchitecture, December 1998. Google ScholarDigital Library
9 Q. Jacobson and J. Smith, "Instruction Pre-Processing in Trace Processors," in Proc. of 5th Int. Symp, on High Performance Computer Architecture, 1999. Google ScholarDigital Library
10 Alpha Architecture Handbook, Digital Equipment Corporation, 1992.Google Scholar
11 Microprocessor Report, 5/11/98.Google Scholar
12 Keith Dieffendorf, "Katmai Enhances MMX," Microprocessor Report, 10/5/98.Google Scholar
13 A. Srivastava and A. Eustace, "ATOM: A System for Building Customized Program Analysis Tools," in Proc. of SIGPLAN Conf. on Programming Language Design and Implementation, June 1994. Google ScholarDigital Library
14 R. Nair and M. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups," in Proc. of 24th Int. Syrup. on Computer Architecture, June 1997. Google ScholarDigital Library
15 M. Franklin and M. Smotherman, "A Fill-Unit Approach to Multiple Instruction Issue," in Proc. of 27th Int. Syrup. on Microarchitecture, December 1994. Google ScholarDigital Library
16 E Rotenberg and J. Smith, "Control Independence in Trace Processors," in Proc. of 32nd Int. Symp. on Microarchitecture, December 1999. Google ScholarDigital Library
17 T. Kistler, "Dynamic Runtime Optimization," in Proc. of the Joint Modular Languages Conference, 1997. Google ScholarDigital Library
18 R. Chappell, J. Stark, S. Kim and Y. Patt, "Simultaneous Subordinate Microthreading (SSMT)," in Proc. of 26th Int. Symp. on Computer Architecture, May 1999. Google ScholarDigital Library
19 Y. Song and M. Dubois, "Assisted Execution," Technical Report #CENG 98-25, Department of EE-Systems, University of Southern California, October 1998.Google Scholar
20 K. Ebcioglu and E. Altman, "DAISY: Dynamic Compilation for 100% Architectural Compatibility," in Proc. of 24th Int. Symp. on Computer Architecture, June 1997. Google ScholarDigital Library
21 M. Schuette, "Exploitation of Instruction-Level Parallelism for Detection of Processor Execution Errors," Ph.D. Thesis, ECE Department, Carnegie Mellon University, 1991. Google ScholarDigital Library
22 T. Chen and J. Baer, "Effective Hardware-Based Data Prefetching for High-Performance Processors," IEEE Transactions on Computers, Vol. 44, No. 5, 1995. Google ScholarDigital Library
23 D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors," in Proc. of 24th Int. Syrup. on Computer Architecture, June 1997. Google ScholarDigital Library
24 T. Mowry, "Tolerating Latency Through Software- Controlled Data Prefetching," Ph.D. Thesis, Stanford University, 1994. Google ScholarDigital Library
25 C. Luk and T. Mowry, "Compiler-Based Prefetching for Recursive Data Structures," in Proc. of 7th ASPLOS, 1996. Google ScholarDigital Library
26 A. Roth, A. Moshovos and G. Sohi, "Dependence Based Prefetching for Linked Data Structures," in Proc. of 8th ASPLOS, October 1998. Google ScholarDigital Library
27 A. Roth and G. Sohi, "Effective Jump-Pointer Prefetching for Linked Data Structures," in Proc. of 26th Int. Syrup. on Computer Architecture, May 1999. Google ScholarDigital Library
28 http://www.spec.orgGoogle Scholar
29 A. Rogers, M. Carlisle, J. Reppy and L. Hendren, "Supporting Dynamic Data Structures on Distributed Memory Machines," ACM Transactions on Programming Languages and Systems, 17(2), March 1995. Google ScholarDigital Library
30 R. Hank, W. Hwu and B. Rau, "Region-based Compilation: An Introduction and Motivation," in Proc. of 28th Int. Syrup. on Microarchitecture, December 1995. Google ScholarDigital Library
31 Y. Chou and J. Shen, "Instruction Path Coprocessors", CMuART Tech. Report, Carnegie Mellon Univ., March 2000.Google Scholar
32 R. Rakvic, B. Black, and J. Shen, "Completion Time Multiple Branch Prediction for Enhancing Trace Cache Performance," in Proc. of 27th Int. Syrup. on Computer Architecture, June 2000. Google ScholarDigital Library

Index Terms

Instruction path coprocessors
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware
  1. Robustness
    1. Fault tolerance
    2. Hardware reliability

Recommendations

Instruction path coprocessors
Read More
Instruction path coprocessors
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

This paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip coprocessor, with its own mini-instruction set, that operates on the core processor's instructions to transform them into an internal format that ...
Read More
Instruction Recycling on a Multiple-Path Processor
HPCA '99: Proceedings of the 5th International Symposium on High Performance Computer Architecture

Processors that can simultaneously execute multiple paths of execution will only exacerbate the fetch bandwidth problem already plaguing conventional processors. On a multiple-path processor, which speculatively executes less likely paths of hard-to-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 28, Issue 2
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
May 2000
325 pages
ISSN:0163-5964
DOI:10.1145/342001
Chairmen:
Alan Berenbaum
Lucent Technologies, Berkeley Heights, NJ
,
Joel Emer
Compaq Computer Corp., Palo Alto, CA
Issue’s Table of Contents
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
June 2000
327 pages
ISBN:1581132328
DOI:10.1145/339647
Chairmen:
Alan Berenbaum
Lucent Technologies
,
Joel Emer
Compaq Computer Corp.
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 722
  Total Downloads
- Downloads (Last 12 months)113
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Instruction path coprocessors

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Instruction path coprocessors

Instruction path coprocessors

Instruction Recycling on a Multiple-Path Processor