The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communication bandwidth, while allowing the exploitation of locality. Despite its distributed memory architecture, Alewife allows efficient shared memory programming through a multilayered approach to locality management. A new scalable cache coherence scheme called LimitLESS directories allows the use of caches for reducing communication latency and network bandwidth requirements. Alewife also employs run-time and compile-time methods for partitioning and placement of data and processes to enhance communication locality. While the above methods attempt to minimize communication latency, remote communication with distant processors cannot be completely avoided. Alewife''s processor, Sparcle, is designed to tolerate these latencies by rapidly switching between threads of computation. This paper describes the Alewife architecture and concentrates on the novel hardware features of the machine including LimitLESS directories and the rapid context switching processor.
Report Downloads
Cited By
- Humphries J, Kaffes K, Mazières D and Kozyrakis C A case against (most) context switches Proceedings of the Workshop on Hot Topics in Operating Systems, (17-25)
- Sharma G and Busch C (2015). A load balanced directory for distributed shared memory objects, Journal of Parallel and Distributed Computing, 78:C, (6-24), Online publication date: 1-Apr-2015.
- Kubiatowicz J and Agarwal A Anatomy of a message in the Alewife multiprocessor ACM International Conference on Supercomputing 25th Anniversary Volume, (193-204)
- Moritz C, Yeung D and Agarwal A (2001). SimpleFit, IEEE Transactions on Parallel and Distributed Systems, 12:7, (730-742), Online publication date: 1-Jul-2001.
- Sleem A, Ahmed M, Kumar A and Kamel K Comparative Study of Parallel vs. Distributed Genetic Algorithm Implementation for ATM Networking Environment Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000)
- Shavit N and Zemach A Scalable concurrent priority queue algorithms Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, (113-122)
- Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J The Stanford FLASH multiprocessor 25 years of the international symposia on Computer architecture (selected papers), (485-496)
- Blumrich M, Li K, Alpert R, Dubnicki C, Felten E and Sandberg J Virtual memory mapped network interface for the SHRIMP multicomputer 25 years of the international symposia on Computer architecture (selected papers), (473-484)
- Soundararajan V, Heinrich M, Verghese B, Gharachorloo K, Gupta A and Hennessy J (1998). Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors, ACM SIGARCH Computer Architecture News, 26:3, (342-355), Online publication date: 1-Jun-1998.
- Soundararajan V, Heinrich M, Verghese B, Gharachorloo K, Gupta A and Hennessy J Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors Proceedings of the 25th annual international symposium on Computer architecture, (342-355)
- Shavit N and Zemach A Combining funnels Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing, (61-70)
- Chien A (1998). A Cost and Speed Model for k-ary n-Cube Wormhole Routers, IEEE Transactions on Parallel and Distributed Systems, 9:2, (150-162), Online publication date: 1-Feb-1998.
- Kim J, Liu Z and Chien A (1997). Compressionless Routing, IEEE Transactions on Parallel and Distributed Systems, 8:3, (229-244), Online publication date: 1-Mar-1997.
- Della-Libera G and Shavit N Reactive diffracting trees Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (24-32)
- Brewer E, Gauthier P, Fox A and Schuett A Software Techniques for Improving MPP Bulk-Transfer Performance Proceedings of the 10th International Parallel Processing Symposium, (406-412)
- Ieromnimon F, Reynolds T and Waite M The Design and Simulation of the PACE Prototype Architecture Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
- Verghese B, Devine S, Gupta A and Rosenblum M (1996). Operating system support for improving data locality on CC-NUMA compute servers, ACM SIGPLAN Notices, 31:9, (279-289), Online publication date: 1-Sep-1996.
- Verghese B, Devine S, Gupta A and Rosenblum M (1996). Operating system support for improving data locality on CC-NUMA compute servers, ACM SIGOPS Operating Systems Review, 30:5, (279-289), Online publication date: 1-Dec-1996.
- Lynch N, Shavit N, Shvartsman A and Touitou D Counting networks are practically linearizable Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, (280-289)
- Shavit N, Upfal E and Zemach A A steady state analysis of diffracting trees (extended abstract) Proceedings of the eighth annual ACM symposium on Parallel Algorithms and Architectures, (33-41)
- Verghese B, Devine S, Gupta A and Rosenblum M Operating system support for improving data locality on CC-NUMA compute servers Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, (279-289)
- Morin C, Gefflaut A, Banâtre M and Kermarrec A (1996). COMA, ACM SIGARCH Computer Architecture News, 24:2, (56-65), Online publication date: 1-May-1996.
- Morin C, Gefflaut A, Banâtre M and Kermarrec A COMA Proceedings of the 23rd annual international symposium on Computer architecture, (56-65)
- Sivasubramaniam A, Singla M, Ramachandran U and Venkateswaran H Abstracting network characteristics and locality properties of parallel systems Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
- Fillo M, Keckler S, Dally W, Carter N, Chang A, Gurevich Y and Lee W The M-Machine multicomputer Proceedings of the 28th annual international symposium on Microarchitecture, (146-156)
- Shavit N and Touitou D Software transactional memory Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing, (204-213)
- Nesson T and Johnsson S ROMM routing on mesh and torus networks Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, (275-287)
- Anderson J, Amarasinghe S and Lam M (1995). Data and computation transformations for multiprocessors, ACM SIGPLAN Notices, 30:8, (166-178), Online publication date: 1-Aug-1995.
- Carlisle M and Rogers A (1995). Software caching and computation migration in Olden, ACM SIGPLAN Notices, 30:8, (29-38), Online publication date: 1-Aug-1995.
- Anderson J, Amarasinghe S and Lam M Data and computation transformations for multiprocessors Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (166-178)
- Carlisle M and Rogers A Software caching and computation migration in Olden Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (29-38)
- Shavit N and Zemach A Diffracting trees (preliminary version) Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, (167-176)
- Gefflaut A, Morin C and Banâtre M Tolerating node failures in cache only memory architectures Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (370-379)
- Prasanna G and Musicus B Generalized multiprocessor scheduling for directed acylic graphs Proceedings of the 1994 ACM/IEEE conference on Supercomputing, (237-246)
- Carter N, Keckler S and Dally W (1994). Hardware support for fast capability-based addressing, ACM SIGOPS Operating Systems Review, 28:5, (319-327), Online publication date: 1-Dec-1994.
- Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGOPS Operating Systems Review, 28:5, (274-285), Online publication date: 1-Dec-1994.
- Heinlein J, Gharachorloo K, Dresser S and Gupta A (1994). Integration of message passing and shared memory in the Stanford FLASH multiprocessor, ACM SIGOPS Operating Systems Review, 28:5, (38-50), Online publication date: 1-Dec-1994.
- Lim B and Agarwal A (1994). Reactive synchronization algorithms for multiprocessors, ACM SIGOPS Operating Systems Review, 28:5, (25-35), Online publication date: 1-Dec-1994.
- Carter N, Keckler S and Dally W Hardware support for fast capability-based addressing Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (319-327)
- Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J The performance impact of flexibility in the Stanford FLASH multiprocessor Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (274-285)
- Heinlein J, Gharachorloo K, Dresser S and Gupta A Integration of message passing and shared memory in the Stanford FLASH multiprocessor Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (38-50)
- Lim B and Agarwal A Reactive synchronization algorithms for multiprocessors Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (25-35)
- Carter N, Keckler S and Dally W (1994). Hardware support for fast capability-based addressing, ACM SIGPLAN Notices, 29:11, (319-327), Online publication date: 1-Nov-1994.
- Heinrich M, Kuskin J, Ofelt D, Heinlein J, Baxter J, Singh J, Simoni R, Gharachorloo K, Nakahira D, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The performance impact of flexibility in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, 29:11, (274-285), Online publication date: 1-Nov-1994.
- Heinlein J, Gharachorloo K, Dresser S and Gupta A (1994). Integration of message passing and shared memory in the Stanford FLASH multiprocessor, ACM SIGPLAN Notices, 29:11, (38-50), Online publication date: 1-Nov-1994.
- Lim B and Agarwal A (1994). Reactive synchronization algorithms for multiprocessors, ACM SIGPLAN Notices, 29:11, (25-35), Online publication date: 1-Nov-1994.
- Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J (1994). The Stanford FLASH multiprocessor, ACM SIGARCH Computer Architecture News, 22:2, (302-313), Online publication date: 1-Apr-1994.
- Blumrich M, Li K, Alpert R, Dubnicki C, Felten E and Sandberg J (1994). Virtual memory mapped network interface for the SHRIMP multicomputer, ACM SIGARCH Computer Architecture News, 22:2, (142-153), Online publication date: 1-Apr-1994.
- Kuskin J, Ofelt D, Heinrich M, Heinlein J, Simoni R, Gharachorloo K, Chapin J, Nakahira D, Baxter J, Horowitz M, Gupta A, Rosenblum M and Hennessy J The Stanford FLASH multiprocessor Proceedings of the 21st annual international symposium on Computer architecture, (302-313)
- Blumrich M, Li K, Alpert R, Dubnicki C, Felten E and Sandberg J Virtual memory mapped network interface for the SHRIMP multicomputer Proceedings of the 21st annual international symposium on Computer architecture, (142-153)
- Srinivasa Prasanna G, Agrawal A and Musicus B (1994). Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory, IEEE Transactions on Parallel and Distributed Systems, 5:7, (720-736), Online publication date: 1-Jul-1994.
- Bhoedjang R, Ruhl T, Hofman R, Langendoen K, Bal H and Kaashoek F Panda USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4, (11-11)
- Herlihy M and Moss J (1993). Transactional memory, ACM SIGARCH Computer Architecture News, 21:2, (289-300), Online publication date: 1-May-1993.
- Hsieh W, Wang P and Weihl W (1993). Computation migration, ACM SIGPLAN Notices, 28:7, (239-248), Online publication date: 1-Jul-1993.
- Yeung D and Agarwal A (1993). Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient, ACM SIGPLAN Notices, 28:7, (187-192), Online publication date: 1-Jul-1993.
- Kranz D, Johnson K, Agarwal A, Kubiatowicz J and Lim B (1993). Integrating message-passing and shared-memory, ACM SIGPLAN Notices, 28:7, (54-63), Online publication date: 1-Jul-1993.
- Papadopoulos G, Boughton G, Greiner R and Beckerle M T Proceedings of the 1993 ACM/IEEE conference on Supercomputing, (624-635)
- Sakai S, Okamoto K, Matsuoka H, Hirono H, Kodama Y and Sato M Super-threading Proceedings of the 7th international conference on Supercomputing, (251-260)
- Ward S, Abdalla K, Dujari R, Fetterman M, Honoré F, Jenez R, Laffont P, Mackenzie K, Metcalf C, Minsky M, Nguyen J, Pezaris J, Pratt G and Tessier R The NuMesh Proceedings of the 7th international conference on Supercomputing, (230-239)
- Kubiatowicz J and Agarwal A Anatomy of a message in the Alewife multiprocessor Proceedings of the 7th international conference on Supercomputing, (195-206)
- Herlihy M and Moss J Transactional memory Proceedings of the 20th annual international symposium on computer architecture, (289-300)
- Kranz D, Lim B, Johnson K, Kubiatowicz J and Agarwal A (1993). Integrating message-passing and shared-memory, ACM SIGPLAN Notices, 28:1, (84), Online publication date: 1-Jan-1993.
- Hsieh W, Wang P and Weihl W Computation migration Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (239-248)
- Yeung D and Agarwal A Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (187-192)
- Kranz D, Johnson K, Agarwal A, Kubiatowicz J and Lim B Integrating message-passing and shared-memory Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (54-63)
- Johnson K (1992). The impact of communication locality on large-scale multiprocessor performance, ACM SIGARCH Computer Architecture News, 20:2, (392-402), Online publication date: 1-May-1992.
- Kubiatowicz J, Chaiken D and Agarwal A (1992). Closing the window of vulnerability in multiphase memory transactions, ACM SIGPLAN Notices, 27:9, (274-284), Online publication date: 1-Sep-1992.
- Henry D and Joerg C (1992). A tightly-coupled processor-network interface, ACM SIGPLAN Notices, 27:9, (111-122), Online publication date: 1-Sep-1992.
- Alverson G, Alverson R, Callahan D, Koblenz B, Porterfield A and Smith B Exploiting heterogeneous parallelism on a multithreaded multiprocessor Proceedings of the 6th international conference on Supercomputing, (188-197)
- Kubiatowicz J, Chaiken D and Agarwal A Closing the window of vulnerability in multiphase memory transactions Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, (274-284)
- Henry D and Joerg C A tightly-coupled processor-network interface Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, (111-122)
- Herlihy M, Lim B and Shavit N Low contention load balancing on large-scale multiprocessors Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, (219-227)
- Johnson K The impact of communication locality on large-scale multiprocessor performance Proceedings of the 19th annual international symposium on Computer architecture, (392-402)
Recommendations
Scalable cache memory design for large-scale SMT architectures
WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architectureThe cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This ...