ABSTRACT
Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-passing communications network. This interpretive layer is responsible for data location, data movement, and cache coherence. It uses patterns of communication that benefit common programming styles, but which are only heuristics. This suggests that certain styles of communication may benefit from direct access to the underlying communications substrate. The Alewife machine, a shared-memory multiprocessor being built at MIT, provides such an interface. The interface is an integral part of the shared memory implementation and affords direct, user-level access to the network queues, supports an efficient DMA mechanism, and includes fast trap handling for message reception. This paper discusses the design and implementation of the Alewife message-passing interface and addresses the issues and advantages of using such an interface to complement hardware-synthesized shared memory.
- 1.Anant Agarwal, David Chaiken, Godfrey D'Souza, Kirk Johnson, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, Beng-Hong Lira, Gino Maa, Dan Nussbaum, Mike Parkin, and Donald Yeung. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. In Proceedings of Workshop on Scalable Shared Memory Multiprocessors. Kluwer Academic Publishers, I991. An extended version of this paper has been submitted for publication, and appears as MIT/LCS Memo TM-454, 1991. Google ScholarDigital Library
- 2.Thomas H. Dunigan. Kendall Square Multiprocessor: Early Experiences and Performance. Technical Report ORNL/TM-12065, Oak Ridge National Laboratory, March 1992.Google ScholarCross Ref
- 3.D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lain. The Stanford Dash Multiprocessor. IEEE Computer, 25(3):63-79, March 1992. Google ScholarDigital Library
- 4.SPARC Architecture Manual, 1988. SUN Microsystems, Mountain View, California. Google ScholarDigital Library
- 5.Charles L. Seitz. Concurrent VLSI Architectures. IEEE Transactions on Computers, C-33(12):I247-1265, December 1984.Google Scholar
- 6.William J. Dally. A VLSi Architecture for ConcurrentData Structures. Kluwer Academic Publishers, 1987. Google ScholarDigital Library
- 7.David Chaiken, John Kubiatowicz, and Anant Agarwal. LimitLESS Directories: A Scalable Cache Coherence Scheme. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pages 224-234. ACM, April 1991. Google ScholarDigital Library
- 8.Anant Agarwal, Richard Simoni, John Hennessy, and Mark Horowitz. An Evaluation of Directory Schemes for Cache Coherence. In Proceedings of the 15th International Symposium on Computer Architecture, New York, June 1988. IEEE. Google ScholarDigital Library
- 9.Shekhar Borkar et al. iWarp: An integrated Solution to High-Speed Parallel Computing. In Proceedings of Supercomputing '88, November 1988. Google ScholarDigital Library
- 10.William J. Dally et al. The J-Machine: A Fine-Grain Concurrent Computer. In Proceedings of the IFIP (International Federationfor In- .formation Processing), 11th Worm Congress, pages 1147-1153, New York, 1989. Elsevier Science Publishing.Google Scholar
- 11.Thorsten von Eicken, David Culler, Seth Goldstein, and Klaus Schauser. Active messages: A mechanism for integrated communication and computation. In 19th International Symposium on Computer Architecture, May 1992. Google ScholarDigital Library
- 12.Dana S. Henry and Christopher E Joerg. A Tightly-Coupled Processor- Network Interface. In Fifth Internataional Architectural Support .for Programming Languages and Operating Systems (ASPLOS V), Boston. October 1992. ACM. Google ScholarDigital Library
- 13.David Kranz, Kirk Johnson, Anant Agarwal, John Kubiatowicz, and Beng-Hong Lim. Integrating Message-Passing and Shared-Memory; Early Experience. In To appear in Proceedings of Practice and Principles of Parallel Programming (PPoPP) 1993, New York, NY, May 1993. ACM. Also as MIT/LCS TM-478, January 1993. Google ScholarDigital Library
- 14.MIT-SPARCLE Specification Version 1.1 (Preliminary). LSI Logic Corporation, Milpitas, CA 95035, 1990. Addendum to the 648I 1 specification.Google Scholar
- 15.C.L. Seitz, N.j. Boden, J. Seizovic, and W.K. Su. The Design of the Caltech Mosaic C Multicomputer. In Research on Integrated Systems Symposium Proceedings, pages 1-22, Cambridge, MA, 1993. MIT Press. Google ScholarDigital Library
- 16.Mark D. Hill, James R, Larus, Steven K. Reinhardt, and David A. Wood. Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), Boston, October 1992. ACM. Google ScholarDigital Library
- 17.John Kubiatowicz. User's Manual for the A- 1000 Communications and Memory Management Unit. ALEWIFE Memo No. 19, Laboratory for Computer Science, Massachusetts Institute of Technology, January 1991.Google Scholar
- 18.John Kubiatowicz, David Chaiken, and Anant Agarwal. Closing the Window of Vulnerability in Multiphase Memory Transactions. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 274-284, Boston, October 1992. ACM. Google ScholarDigital Library
- 19.D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings 17th Annual International Symposium on Computer Architecture, pages 148-159, New York, June 1990. Google ScholarDigital Library
- 20.Anant Agarwal, John Kubiatowicz, David Kranz, Beng-Hong Lim, Donald Yeung, Godfrey D'Souza, and Mike Parkin. Sparcle: An Evolutionary Processor Design for Multiprocessors. To appear in IEEE Micro, June 1993. Google ScholarDigital Library
- 21.E. Mohr, D. Kranz, and R. Halstead. Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs. IEEE Transactions on Parallel and Distributed Systems, 2(3):264-280, July 199 I. Google ScholarDigital Library
- 22.The Connection Machine System: Programming the NI. Thinking Machines Corporation, March 1992. Version 7.1.Google Scholar
- 23.K. Gharachorloo, D. Lenoski, J. Laudon, E Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings 17th Annual International Symposium on Computer Architecture, New York, June 1990. IEEE, Google ScholarDigital Library
- 24.Bob Beck, Bob Kasten, and Shreekant Thakkar. VLSI Assist for a Multiprocessor. in Proceedings Second international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS II), Washington, DC, October 1987. IEEE. Google ScholarDigital Library
- 25.A. Cox and R. Fowler. The Implementation of a Coherent Memory Abstraction on a NUMA Multiprocessor: Experiences with PLAT- INUM. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 32-44, December 1989. Also as a Univ. Rochester TR-263, May 1989. Google ScholarDigital Library
Index Terms
- Anatomy of a message in the Alewife multiprocessor
Recommendations
Anatomy of a message in the Alewife multiprocessor
ACM International Conference on Supercomputing 25th Anniversary VolumeShared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-passing communications network. This interpretive layer is responsible for ...
Author retrospective for anatomy of a message in the alewife multiprocessor
ACM International Conference on Supercomputing 25th Anniversary VolumeThe MIT Alewife project, launched in the Spring of 1988, comprised a dynamic group of researchers who designed and implemented the Alewife multiprocessor [1]. One of the most important and unexpected outcomes of this project was the message-passing ...
Comments