Architectural and implementation tradeoffs for multiple-context processors

November 1995

Author:
James Pierce Laudon
Stanford Univ.

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:UMI Order No. GAX94-29960

Bibliometrics

Abstract

Tolerating memory latency is essential to achieving high performance in scalable shared-memory multiprocessors. In addition, tolerating instruction (pipeline dependency) latency is essential to maximize the performance of individual processors. Multiple-context processors have been proposed as a universal mechanism to mitigate the negative effects of latency. These processors tolerate latency by switching to a concurrent thread of execution whenever one of the threads blocks due to a high-latency operation. Multiple context processors built so far, however, either have a high context-switch cost which disallows tolerance of short latencies (e.g., due to pipeline dependencies), or alternatively they require excessive concurrency from the software.

We propose a multiple-context architecture that combines full single-thread support with cycle-by-cycle context interleaving to provide lower switch costs and the ability to tolerate short latencies. We compare the performance of our proposal with that of earlier approaches, showing that our approach offers substantially better performance for parallel applications. We also explore using our approach for uniprocessor workstations--an important environment for commodity microprocessors. We show that our approach also offers much better performance for multiprogrammed uniprocessor workloads.

Finally, we explore the implementation issues for both our proposed and existing multiple-context architectures. One of the larger costs for a multiple-context processor arises in providing a cache capable of handling multiple outstanding requests, and we propose a lockup-free cache which provides high performance at a reasonable cost. We also show that amount of processor state that needs to be replicated to support multiple contexts is modest and the extra complexity required to control the multiple contexts under both our proposed and existing approaches is manageable. The performance benefits and reasonable implementation cost of our approach make it a promising candidate for addition to future microprocessors.

Cited By

Contributors

James P. Laudon
Google LLC
- Publication Years1987 - 2021
- Publication counts27
- Citation count10,468
- Available for Download29
- Downloads (cumulative)153,791
- Downloads (12 months)28,729
- Downloads (6 weeks)3,681
- Average Downloads per Article5,303
- Average Citation per Article388
View Full Profile

Index Terms

Architectural and implementation tradeoffs for multiple-context processors
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Architectural and Implementation Tradeoffs for Multiple-Context Processors
Read More
Prediction caches for superscalar processors
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

Processor cycle times are currently much faster than memory cycle times, and this gap continues to increase. Adding a high speed cache memory allows the processor to run at full speed, as long as the data it needs is present in the cache. However, ...
Read More
Effects of Multithreading on Cache Performance
Special issue on cache memory and related problems

As the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency ...
Read More

Comments

Browse Theses

Sections

Cited By

Index Terms

Architectural and Implementation Tradeoffs for Multiple-Context Processors

Prediction caches for superscalar processors

Effects of Multithreading on Cache Performance

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Architectural and Implementation Tradeoffs for Multiple-Context Processors

Prediction caches for superscalar processors

Effects of Multithreading on Cache Performance