skip to main content
Scaling high-performance interconnect architectures to many-core systems
Publisher:
  • University of Michigan
  • Dept. 72 Ann Arbor, MI
  • United States
ISBN:978-1-267-70245-6
Order Number:AAI3530662
Pages:
95
Bibliometrics
Skip Abstract Section
Abstract

The ever-increasing demand for performance scaling has made multi-core (2-8 cores) chips prevalent in today's computing systems and foreshadows the shift toward many-core (10s-100s of cores) chips in the near future. Although the potential performance gains from many-core systems remain appealing, the widespread adoption of these systems hinges on their ability to scale performance while simultaneously satisfying Quality-of-Service (QoS) and energy-efficiency constraints.

This work makes the case that the interconnect for these many-core systems has a significant impact on the aforementioned scalability issues. The impact of interconnects on many-core systems is illustrated by observing that the degree of the interconnect has a significant effect on system scalability and demonstrating that the architecture of high-radix, many-core systems are feasible, energy-efficient, and high-performance.

The feasibility of high-radix crossbars for many-core systems is first shown through a new circuit-level building block called the Swizzle-Switch. A 32nm Swizzle-Switch utilizes integrated arbitration techniques to provides an energy- and area-efficient switching element that improves the scalability of crossbars to a high radices. The Swizzle-Switch is shown to operate at frequencies up to 1.5GHz for 128-bit, radix-64 crossbars and also to have the ability to implement many arbitration policies such as Least-Recently Granted (LRG) and Round-Robin (RR). Results show that Swizzle-Switch' s LRG arbitration policy reduces the worst-case request access latency by 1.83× and 2.03× on average over round robin and random arbitration schemes, respectively.

This work then shows how a many-core system called the Swizzle-Switch Network can use the Swizzle-Switch as the central building block for a flat crossbar interconnect. The Swizzle-Switch Network is shown to be advantageous to traditional Network-on-Chip (NoC) for systems up to 64 cores. The Swizzle-Switch Network improves system performance by 21%, reduces L1 on-chip average miss latency by 2.2×, and decreases the standard deviation of that L1 miss latency by 3.0× relative to a Mesh NoC topology. Additionally, all of these performance benefits are obtained while providing a 25% energy savings over the Mesh.

The Swizzle-Switch is also leveraged as a building block for high-radix NoC topologies that can support many-core architectures. The Swizzle-Switch- based Flattened Butterfly topology is demonstrated to provide a 15% speedup, 1.76× smaller L1 on-chip average miss latency, 2.5× reduction in miss latency standard deviation, and 10% energy savings over the Mesh topology.

Finally, the impact that 3D stacking technology has on many-core scalability is evaluated and shown to assist crossbar and bus interconnects in scaling past their traditional limitations. A 3D-optimized Swizzle-Switch Network is able to leverage frequency gains to achieve a 15-28% speedup over a 2D- Swizzle-Switch Network when using memory-intensive benchmarks. Additionally, a bus-based 64-core architecture is shown to provide an average speedup of 49× over a baseline uniprocessor system when using 3D technology.

Contributors
  • University of Michigan, Ann Arbor
  • University of Michigan, Ann Arbor

Recommendations