Abstract
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of message driven machines demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware support for active messages is desirable and we outline a range of enhancements to mainstream processors.
- 1 Arvind and R. A. Iannucci. Two Fundamental Issues in Multiprocessing. In Proc. of DFVLR- Conf. 1987 on Par. Proc. in Science and Eng., Bonn-Bad Godesberg, W. Germany, June 1987. Google ScholarDigital Library
- 2 B. N. Bershad, T. E. Anderson, E. D. Lazowska, and H. M. Levy. Lightweight Remote Procedure Call. ACM Trans. on Computer Systems, 8(1), February 1990. Google ScholarDigital Library
- 3 D. Culler, A. Sah, K. Schauser, T. yon Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. (Also available as Technical Report UCB/CSD 91/591, CS Div., University of California at Berkeley). Google ScholarDigital Library
- 4 D.E. Culler and Arvind. Resource Requirements of Dataflow Programs. In Proc. of the 15th Ann. Int. Syrup. on Comp. Arch., pages 141-150, Hawaii, May 1988. Google ScholarDigital Library
- 5 W. Dally and et al. Architecture of a Message-Driven Processor. In Proc. of the 14th Annual Int. Syrup. on Comp. Arch., pages 189-196, June 1987. Google ScholarDigital Library
- 6 W. Dally and et al. The J-Machine: A Fine-Grain Concurrent Computer. In IFIP Congress, 1989.Google ScholarCross Ref
- 7 J. J. Dongarra. Performance of Various Computers Using Standard Linear Equations Software. Technical Report CS-89-85, Computer Science Dept., Univ. of Tennessee, Knoxville, TN 37996, December 1991. Google ScholarDigital Library
- 8 T. H. Dunigan. Performance of a Second (3eneration Hypercube. Technical Report ORNL/TM-10881, Oak Ridge Nat'l Lab, November 1988.Google Scholar
- 9 G. Fox. ProgrammingConcurrentProcessors. Addison Wesley, 1989.Google Scholar
- 10 R. H. Halstead, Jr. Multilisp: A Language for Concurrent Symbolic Computation. ACM Transactions on Programming Languages and Systems, 7(4):501-538, October 1985. Google ScholarDigital Library
- 11 W. Horwat, A. A. Chien, and W. J. Dally. Experience with CST: Programming and Implementation. In Proc. ofthe ACM SIGPLAN '89 Conference o n Pro gr ammin g Lang ua g e Design and Implementation, 1989. Google ScholarDigital Library
- 12 Intel. Personal communication, 1991.Google Scholar
- 13 D. Johnson. Trap Architectures for Lisp Sy.,;tems. In Proc. of the 1990 ACM conf. on Lisp and Functional Programming, June 1990. Google ScholarDigital Library
- 14 R. S. Nikhil. The Parallel Programming Language Id and its Compilation for Parallel Machines. In Proc. Workshop on Massive Parallelism, Amalfi, Italy, October' 1989. Academic Press, 1991. Also: CSG Memo 313, Mrr Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA.Google Scholar
- 15 R. S. Nikhil, G. M. Papadopoulos, and Arvind. *T: A Killer Micro for A Brave New World. Technical Report CSG Memo 325, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, January 1991.Google Scholar
- 16 G. M. Papadopoulos. Implementation of a General Purpose Dataflow Multiprocessor. Technical Report TR432, MIT Lab for Comp. Sci., 545 Tech. Square, Cambridge, MA, September 1988. (PhD Thesis, Dept. of EECS, MIT).Google Scholar
- 17 G. M. Papadopoulos and D. E. Culler. Monsoon: an Explicit Token-Store Architecture. In Proc. of the 17th Annual Int. Syrup. on Comp. Arch., Seattle, Washington, May 1990. Google ScholarDigital Library
- 18 A. Thekkath and H. M. Levy. Limits to I_x)w-Latency RPC. Technical Report TR 91-06-01, Dept. of Computer Science and Engineering, University of Washington, Seattle WA 98195, 1991.Google Scholar
Index Terms
- Active messages: a mechanism for integrated communication and computation
Recommendations
Active messages: a mechanism for integrated communication and computation
ISCA '92: Proceedings of the 19th annual international symposium on Computer architectureThe design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message ...
Comments