Abstract
There is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only natural that architectural features that benefit only multiprocessors are less likely to be adopted in commodity microprocessors. In this paper, we explore multiple-context processors, an architectural technique proposed to hide the large memory latency in multiprocessors. We show that while current multiple-context designs work reasonably well for multiprocessors, they are ineffective in hiding the much shorter uniprocessor latencies using the limited parallelism found in workstation environments. We propose an alternative design that combines the best features of two existing approaches, and present simulation results that show it yields better performance for both multiprogrammed workloads on a workstation and parallel applications on a multiprocessor. By addressing the needs of the workstation environment, our proposal makes multiple contexts more attractive for commodity microprocessors.
- 1 Anant Agarwal. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525-539, September 1992. Google ScholarDigital Library
- 2 Cray Research, Incorporated. Cray T3D Technical Summary, October 1993.Google Scholar
- 3 David E. Culler, Michial Gunter, and James C. Lee. Analysis of multithreaded microprocessors under multiprogramming. Technical Report UCB/CSD 92/687, University of California, Berkeley, May 1992. Google ScholarDigital Library
- 4 George E. Daddis Jr. and H. C. Tomg. The concurrent execution of multiple instruction streams on superscalar processors. In Proceedings of the 1991 International Conference on Parallel Processing, volume I, pages 76--83, 1991.Google Scholar
- 5 Helen Davis, Steven R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, pages 99-107, August 1991.Google Scholar
- 6 Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.Google Scholar
- 7 Kourosh Gharachorloo, Dan Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15-26, May 1990. Google ScholarDigital Library
- 8 Anoop Gupta, John Hennessy, Kourosh Gharachorloo, Todd Mowry, and Wolf-Dietrich Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceeding of the 18th Annual international Symposium on Computer Architecture, pages 254-263, May 1991. Google ScholarDigital Library
- 9 Joe Heinrich. MIPS R4000 User's Manual. Prentice-Hall, 1993. Google ScholarDigital Library
- 10 William Jaffe, Bob Miller, and Jeff Yetter. A 200 MFLOP precision architecture processor. In Hot Chips IV Symposium Record, pages 1.2.1-1.2.13, August 1992.Google Scholar
- 11 Stephen W. Keckler and William J. Dally. Processor coupling: Integrating compile time and runtime scheduling for parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 202-213, May 1992. Google ScholarDigital Library
- 12 David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual Symposium on Computer Architecture, pages 81-87, 1981. Google ScholarDigital Library
- 13 Kiyoshi Kurihara, David Chaiken, and Anant Agarwal. Latency tolerance through multithreading in large-scale multiprocessors. In Proceedings of the International Symposium on Shared Memory Multiprocessing, pages 91-101, April 1991.Google Scholar
- 14 James Laudon. Architectural and Implementation Tradeoffs for Multiple-Context Processors. PhD thesis, Stanford University, Stanford, California, May 1994. Google ScholarDigital Library
- 15 James Laudon, Anoop Gupta, and Mark Horowitz. Architectural and implementation tradeoffs in the design of multiplecontext processors. Technical Report CSL-TR-92-523, Stanford University, May 1992. Google ScholarDigital Library
- 16 Dan Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990. Google ScholarDigital Library
- 17 Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, October 1992. Google ScholarDigital Library
- 18 Peter R. Nuth and William J. Dally. A mechanism for efficient context switching. In Proceedings of the 1991 IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 301-304, 1991. Google ScholarDigital Library
- 19 Amos R. Omondi. Design of a high performance instruction pipeline. Computer Systems Science and Engineering, 6(1): 13-29, January 1991.Google Scholar
- 20 R. Guru Prasadh and Chuan-lin Wu. A benchmark evaluation of a multi-threaded RISC processor architecture. In Proceedings of the 1991 International Conference on Parallel Processing, volume I, pages 84-91, 1991.Google Scholar
- 21 Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford parallel applications for sharedmemory. Computer Architecture News, 20(1):5--44, March 1992. Google ScholarDigital Library
- 22 Burton J. Smith. Architecture and applications of the HEP multiprocessor computer system. SPIE, 298:241-248, 1981.Google ScholarCross Ref
- 23 Michael David Smith. Support for Speculative Execution in High-Performance Processors. PhD thesis, Stanford University, Stanford, California, November 1992.Google ScholarDigital Library
- 24 S. Peter Song and Marvin Denman. The PowerPC 604TM RISC microprocessor. Motorola Luncheon, iSCA '94, April 1994.Google Scholar
- 25 Josep Torrellas. Multiprocessor Cache Memory Performance: Characterization and Optimization. PhD thesis, Stanford University, Stanford, California, August 1992. Google ScholarDigital Library
- 26 Wolf-Dietrich Weber and Anoop Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 273-280, June 1989. Google ScholarDigital Library
Recommendations
Simultaneous multithreading: maximizing on-chip parallelism
This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and ...
Comments