skip to main content
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSORJune 1991
1991 Technical Report
Publisher:
  • Massachusetts Institute of Technology
  • 201 Vassar Street, W59-200 Cambridge, MA
  • United States
Published:01 June 1991
Bibliometrics
Skip Abstract Section
Abstract

The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communication bandwidth, while allowing the exploitation of locality. Despite its distributed memory architecture, Alewife allows efficient shared memory programming through a multilayered approach to locality management. A new scalable cache coherence scheme called LimitLESS directories allows the use of caches for reducing communication latency and network bandwidth requirements. Alewife also employs run-time and compile-time methods for partitioning and placement of data and processes to enhance communication locality. While the above methods attempt to minimize communication latency, remote communication with distant processors cannot be completely avoided. Alewife''s processor, Sparcle, is designed to tolerate these latencies by rapidly switching between threads of computation. This paper describes the Alewife architecture and concentrates on the novel hardware features of the machine including LimitLESS directories and the rapid context switching processor.

Cited By

  1. ACM
    Humphries J, Kaffes K, Mazières D and Kozyrakis C A case against (most) context switches Proceedings of the Workshop on Hot Topics in Operating Systems, (17-25)
  2. Sharma G and Busch C (2015). A load balanced directory for distributed shared memory objects, Journal of Parallel and Distributed Computing, 78:C, (6-24), Online publication date: 1-Apr-2015.
  3. ACM
    Kubiatowicz J and Agarwal A Anatomy of a message in the Alewife multiprocessor ACM International Conference on Supercomputing 25th Anniversary Volume, (193-204)
  4. Moritz C, Yeung D and Agarwal A (2001). SimpleFit, IEEE Transactions on Parallel and Distributed Systems, 12:7, (730-742), Online publication date: 1-Jul-2001.
  5. Sleem A, Ahmed M, Kumar A and Kamel K Comparative Study of Parallel vs. Distributed Genetic Algorithm Implementation for ATM Networking Environment Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000)
  6. ACM
    Shavit N and Zemach A Scalable concurrent priority queue algorithms Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, (113-122)
  7. ACM
    Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J The Stanford FLASH multiprocessor 25 years of the international symposia on Computer architecture (selected papers), (485-496)
  8. ACM
    Blumrich M, Li K, Alpert R, Dubnicki C, Felten E and Sandberg J Virtual memory mapped network interface for the SHRIMP multicomputer 25 years of the international symposia on Computer architecture (selected papers), (473-484)
  9. ACM
    Soundararajan V, Heinrich M, Verghese B, Gharachorloo K, Gupta A and Hennessy J (1998). Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors, ACM SIGARCH Computer Architecture News, 26:3, (342-355), Online publication date: 1-Jun-1998.
  10. Soundararajan V, Heinrich M, Verghese B, Gharachorloo K, Gupta A and Hennessy J Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors Proceedings of the 25th annual international symposium on Computer architecture, (342-355)
  11. ACM
    Shavit N and Zemach A Combining funnels Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, (61-70)
  12. Chien A (1998). A Cost and Speed Model for k-ary n-Cube Wormhole Routers, IEEE Transactions on Parallel and Distributed Systems, 9:2, (150-162), Online publication date: 1-Feb-1998.
  13. Kim J, Liu Z and Chien A (1997). Compressionless Routing, IEEE Transactions on Parallel and Distributed Systems, 8:3, (229-244), Online publication date: 1-Mar-1997.
  14. ACM
    Della-Libera G and Shavit N Reactive diffracting trees Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (24-32)
  15. Brewer E, Gauthier P, Fox A and Schuett A Software Techniques for Improving MPP Bulk-Transfer Performance Proceedings of the 10th International Parallel Processing Symposium, (406-412)
  16. Ieromnimon F, Reynolds T and Waite M The Design and Simulation of the PACE Prototype Architecture Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
  17. ACM
    Verghese B, Devine S, Gupta A and Rosenblum M (1996). Operating system support for improving data locality on CC-NUMA compute servers, ACM SIGPLAN Notices, 31:9, (279-289), Online publication date: 1-Sep-1996.
  18. ACM
    Verghese B, Devine S, Gupta A and Rosenblum M (1996). Operating system support for improving data locality on CC-NUMA compute servers, ACM SIGOPS Operating Systems Review, 30:5, (279-289), Online publication date: 1-Dec-1996.
  19. ACM
    Lynch N, Shavit N, Shvartsman A and Touitou D Counting networks are practically linearizable Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, (280-289)
  20. ACM
    Shavit N, Upfal E and Zemach A A steady state analysis of diffracting trees (extended abstract) Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures, (33-41)
  21. ACM
    Verghese B, Devine S, Gupta A and Rosenblum M Operating system support for improving data locality on CC-NUMA compute servers Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, (279-289)
  22. ACM
    Morin C, Gefflaut A, Banâtre M and Kermarrec A (1996). COMA, ACM SIGARCH Computer Architecture News, 24:2, (56-65), Online publication date: 1-May-1996.
  23. ACM
    Morin C, Gefflaut A, Banâtre M and Kermarrec A COMA Proceedings of the 23rd annual international symposium on Computer architecture, (56-65)
  24. Sivasubramaniam A, Singla M, Ramachandran U and Venkateswaran H Abstracting network characteristics and locality properties of parallel systems Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
  25. Fillo M, Keckler S, Dally W, Carter N, Chang A, Gurevich Y and Lee W The M-Machine multicomputer Proceedings of the 28th annual international symposium on Microarchitecture, (146-156)
  26. ACM
    Shavit N and Touitou D Software transactional memory Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, (204-213)
  27. ACM
    Nesson T and Johnsson S ROMM routing on mesh and torus networks Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, (275-287)
  28. ACM
    Anderson J, Amarasinghe S and Lam M (1995). Data and computation transformations for multiprocessors, ACM SIGPLAN Notices, 30:8, (166-178), Online publication date: 1-Aug-1995.
  29. ACM
    Carlisle M and Rogers A (1995). Software caching and computation migration in Olden, ACM SIGPLAN Notices, 30:8, (29-38), Online publication date: 1-Aug-1995.
  30. ACM
    Anderson J, Amarasinghe S and Lam M Data and computation transformations for multiprocessors Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (166-178)
  31. ACM
    Carlisle M and Rogers A Software caching and computation migration in Olden Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (29-38)
  32. ACM
    Shavit N and Zemach A Diffracting trees (preliminary version) Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, (167-176)
  33. Gefflaut A, Morin C and Banâtre M Tolerating node failures in cache only memory architectures Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (370-379)
  34. Prasanna G and Musicus B Generalized multiprocessor scheduling for directed acylic graphs Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (237-246)
  35. ACM
    Carter N, Keckler S and Dally W (1994). Hardware support for fast capability-based addressing, ACM SIGOPS Operating Systems Review, 28:5, (319-327), Online publication date: 1-Dec-1994.
  36. ACM
    Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGOPS Operating Systems Review, 28:5, (274-285), Online publication date: 1-Dec-1994.
  37. ACM
    Heinlein J, Gharachorloo K, Dresser S and Gupta A (1994). Integration of message passing and shared memory in the Stanford FLASH multiprocessor, ACM SIGOPS Operating Systems Review, 28:5, (38-50), Online publication date: 1-Dec-1994.
  38. ACM
    Lim B and Agarwal A (1994). Reactive synchronization algorithms for multiprocessors, ACM SIGOPS Operating Systems Review, 28:5, (25-35), Online publication date: 1-Dec-1994.
  39. ACM
    Carter N, Keckler S and Dally W Hardware support for fast capability-based addressing Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (319-327)
  40. ACM
    Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J The performance impact of flexibility in the Stanford FLASH multiprocessor Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (274-285)
  41. ACM
    Heinlein J, Gharachorloo K, Dresser S and Gupta A Integration of message passing and shared memory in the Stanford FLASH multiprocessor Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (38-50)
  42. ACM
    Lim B and Agarwal A Reactive synchronization algorithms for multiprocessors Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (25-35)
  43. ACM
    Carter N, Keckler S and Dally W (1994). Hardware support for fast capability-based addressing, ACM SIGPLAN Notices, 29:11, (319-327), Online publication date: 1-Nov-1994.
  44. ACM
    Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, 29:11, (274-285), Online publication date: 1-Nov-1994.
  45. ACM
    Heinlein J, Gharachorloo K, Dresser S and Gupta A (1994). Integration of message passing and shared memory in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, 29:11, (38-50), Online publication date: 1-Nov-1994.
  46. ACM
    Lim B and Agarwal A (1994). Reactive synchronization algorithms for multiprocessors, ACM SIGPLAN Notices, 29:11, (25-35), Online publication date: 1-Nov-1994.
  47. ACM
    Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, 22:2, (302-313), Online publication date: 1-Apr-1994.
  48. ACM
    Blumrich M, Li K, Alpert R, Dubnicki C, Felten E and Sandberg J (1994). Virtual memory mapped network interface for the SHRIMP multicomputer, ACM SIGARCH Computer Architecture News, 22:2, (142-153), Online publication date: 1-Apr-1994.
  49. Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J The Stanford FLASH multiprocessor Proceedings of the 21st annual international symposium on Computer architecture, (302-313)
  50. Blumrich M, Li K, Alpert R, Dubnicki C, Felten E and Sandberg J Virtual memory mapped network interface for the SHRIMP multicomputer Proceedings of the 21st annual international symposium on Computer architecture, (142-153)
  51. Srinivasa Prasanna G, Agrawal A and Musicus B (1994). Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory, IEEE Transactions on Parallel and Distributed Systems, 5:7, (720-736), Online publication date: 1-Jul-1994.
  52. Bhoedjang R, Ruhl T, Hofman R, Langendoen K, Bal H and Kaashoek F Panda USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4, (11-11)
  53. ACM
    Herlihy M and Moss J (1993). Transactional memory, ACM SIGARCH Computer Architecture News, 21:2, (289-300), Online publication date: 1-May-1993.
  54. ACM
    Hsieh W, Wang P and Weihl W (1993). Computation migration, ACM SIGPLAN Notices, 28:7, (239-248), Online publication date: 1-Jul-1993.
  55. ACM
    Yeung D and Agarwal A (1993). Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient, ACM SIGPLAN Notices, 28:7, (187-192), Online publication date: 1-Jul-1993.
  56. ACM
    Kranz D, Johnson K, Agarwal A, Kubiatowicz J and Lim B (1993). Integrating message-passing and shared-memory, ACM SIGPLAN Notices, 28:7, (54-63), Online publication date: 1-Jul-1993.
  57. ACM
    Papadopoulos G, Boughton G, Greiner R and Beckerle M T Proceedings of the 1993 ACM/IEEE conference on Supercomputing, (624-635)
  58. ACM
    Sakai S, Okamoto K, Matsuoka H, Hirono H, Kodama Y and Sato M Super-threading Proceedings of the 7th international conference on Supercomputing, (251-260)
  59. ACM
    Ward S, Abdalla K, Dujari R, Fetterman M, Honoré F, Jenez R, Laffont P, Mackenzie K, Metcalf C, Minsky M, Nguyen J, Pezaris J, Pratt G and Tessier R The NuMesh Proceedings of the 7th international conference on Supercomputing, (230-239)
  60. ACM
    Kubiatowicz J and Agarwal A Anatomy of a message in the Alewife multiprocessor Proceedings of the 7th international conference on Supercomputing, (195-206)
  61. ACM
    Herlihy M and Moss J Transactional memory Proceedings of the 20th annual international symposium on computer architecture, (289-300)
  62. ACM
    Kranz D, Lim B, Johnson K, Kubiatowicz J and Agarwal A (1993). Integrating message-passing and shared-memory, ACM SIGPLAN Notices, 28:1, (84), Online publication date: 1-Jan-1993.
  63. ACM
    Hsieh W, Wang P and Weihl W Computation migration Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (239-248)
  64. ACM
    Yeung D and Agarwal A Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (187-192)
  65. ACM
    Kranz D, Johnson K, Agarwal A, Kubiatowicz J and Lim B Integrating message-passing and shared-memory Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (54-63)
  66. ACM
    Johnson K (1992). The impact of communication locality on large-scale multiprocessor performance, ACM SIGARCH Computer Architecture News, 20:2, (392-402), Online publication date: 1-May-1992.
  67. ACM
    Kubiatowicz J, Chaiken D and Agarwal A (1992). Closing the window of vulnerability in multiphase memory transactions, ACM SIGPLAN Notices, 27:9, (274-284), Online publication date: 1-Sep-1992.
  68. ACM
    Henry D and Joerg C (1992). A tightly-coupled processor-network interface, ACM SIGPLAN Notices, 27:9, (111-122), Online publication date: 1-Sep-1992.
  69. ACM
    Alverson G, Alverson R, Callahan D, Koblenz B, Porterfield A and Smith B Exploiting heterogeneous parallelism on a multithreaded multiprocessor Proceedings of the 6th international conference on Supercomputing, (188-197)
  70. ACM
    Kubiatowicz J, Chaiken D and Agarwal A Closing the window of vulnerability in multiphase memory transactions Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, (274-284)
  71. ACM
    Henry D and Joerg C A tightly-coupled processor-network interface Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, (111-122)
  72. ACM
    Herlihy M, Lim B and Shavit N Low contention load balancing on large-scale multiprocessors Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, (219-227)
  73. ACM
    Johnson K The impact of communication locality on large-scale multiprocessor performance Proceedings of the 19th annual international symposium on Computer architecture, (392-402)
Contributors
  • Tilera Corporation
  • Hewlett-Packard Inc.
  • Cisco Systems
  • Yale University
  • University of California, Berkeley
  • Massachusetts Institute of Technology
  • VMware, Inc
  • Massachusetts Institute of Technology
  • Sun Microsystems
  • Oberlin College and Conservatory
  • University of Maryland, College Park

Recommendations