Scaling high-performance interconnect architectures to many-core systems

January 2012

Author:
Korey Lamar Sewell
University of Michigan
,
Adviser:
Trevor Mudge
University of Michigan

Publisher:

University of Michigan
Dept. 72 Ann Arbor, MI
United States

ISBN:978-1-267-70245-6

Order Number:AAI3530662

Pages:

Purchase on ProQuest

Bibliometrics

Abstract

The ever-increasing demand for performance scaling has made multi-core (2-8 cores) chips prevalent in today's computing systems and foreshadows the shift toward many-core (10s-100s of cores) chips in the near future. Although the potential performance gains from many-core systems remain appealing, the widespread adoption of these systems hinges on their ability to scale performance while simultaneously satisfying Quality-of-Service (QoS) and energy-efficiency constraints.

This work makes the case that the interconnect for these many-core systems has a significant impact on the aforementioned scalability issues. The impact of interconnects on many-core systems is illustrated by observing that the degree of the interconnect has a significant effect on system scalability and demonstrating that the architecture of high-radix, many-core systems are feasible, energy-efficient, and high-performance.

The feasibility of high-radix crossbars for many-core systems is first shown through a new circuit-level building block called the Swizzle-Switch. A 32nm Swizzle-Switch utilizes integrated arbitration techniques to provides an energy- and area-efficient switching element that improves the scalability of crossbars to a high radices. The Swizzle-Switch is shown to operate at frequencies up to 1.5GHz for 128-bit, radix-64 crossbars and also to have the ability to implement many arbitration policies such as Least-Recently Granted (LRG) and Round-Robin (RR). Results show that Swizzle-Switch' s LRG arbitration policy reduces the worst-case request access latency by 1.83× and 2.03× on average over round robin and random arbitration schemes, respectively.

This work then shows how a many-core system called the Swizzle-Switch Network can use the Swizzle-Switch as the central building block for a flat crossbar interconnect. The Swizzle-Switch Network is shown to be advantageous to traditional Network-on-Chip (NoC) for systems up to 64 cores. The Swizzle-Switch Network improves system performance by 21%, reduces L1 on-chip average miss latency by 2.2×, and decreases the standard deviation of that L1 miss latency by 3.0× relative to a Mesh NoC topology. Additionally, all of these performance benefits are obtained while providing a 25% energy savings over the Mesh.

The Swizzle-Switch is also leveraged as a building block for high-radix NoC topologies that can support many-core architectures. The Swizzle-Switch- based Flattened Butterfly topology is demonstrated to provide a 15% speedup, 1.76× smaller L1 on-chip average miss latency, 2.5× reduction in miss latency standard deviation, and 10% energy savings over the Mesh topology.

Finally, the impact that 3D stacking technology has on many-core scalability is evaluated and shown to assist crossbar and bus interconnects in scaling past their traditional limitations. A 3D-optimized Swizzle-Switch Network is able to leverage frequency gains to achieve a 15-28% speedup over a 2D- Swizzle-Switch Network when using memory-intensive benchmarks. Additionally, a bus-based 64-core architecture is shown to provide an average speedup of 49× over a baseline uniprocessor system when using 3D technology.

Cited By

Contributors

Trevor Nigel Mudge
University of Michigan, Ann Arbor
- Publication Years1977 - 2023
- Publication counts252
- Citation count14,423
- Available for Download206
- Downloads (cumulative)199,168
- Downloads (12 months)20,958
- Downloads (6 weeks)3,255
- Average Downloads per Article967
- Average Citation per Article57
View Full Profile
Korey Lamar Sewell
University of Michigan, Ann Arbor
- Publication Years2011 - 2013
- Publication counts6
- Citation count4,194
- Available for Download3
- Downloads (cumulative)15,527
- Downloads (12 months)1,154
- Downloads (6 weeks)158
- Average Downloads per Article5,176
- Average Citation per Article699
View Full Profile

Recommendations

Adapting the Hyper-Ring Interconnect for Many-Core Processors
ISPA '08: Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

This paper makes the case for the Hyper-Ring as the interconnect or NoC for many-cores. While other prominent candidates for many-core interconnect such as the torus and mesh have superior bisection bandwidth to the HR, their cost, number of links and ...
Read More
Architectural integration of rf-interconnect to enhance on-chip communication for many-core chip multiprocessors
Read More
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures

Multiprocessor system-on-chip (MP-SoC) platforms are emerging as an important trend for SoC design. Power and wire design constraints are forcing the adoption of new design methodologies for system-on-chip (SoC), namely, those that incorporate ...
Read More

Comments

Browse Theses

Sections

Cited By

Adapting the Hyper-Ring Interconnect for Many-Core Processors

Architectural integration of rf-interconnect to enhance on-chip communication for many-core chip multiprocessors

Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures

Sections

Cited By

Save to Binder

Recommendations

Adapting the Hyper-Ring Interconnect for Many-Core Processors

Architectural integration of rf-interconnect to enhance on-chip communication for many-core chip multiprocessors

Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures