The ever-increasing demand for performance scaling has made multi-core (2-8 cores) chips prevalent in today's computing systems and foreshadows the shift toward many-core (10s-100s of cores) chips in the near future. Although the potential performance gains from many-core systems remain appealing, the widespread adoption of these systems hinges on their ability to scale performance while simultaneously satisfying Quality-of-Service (QoS) and energy-efficiency constraints.
This work makes the case that the interconnect for these many-core systems has a significant impact on the aforementioned scalability issues. The impact of interconnects on many-core systems is illustrated by observing that the degree of the interconnect has a significant effect on system scalability and demonstrating that the architecture of high-radix, many-core systems are feasible, energy-efficient, and high-performance.
The feasibility of high-radix crossbars for many-core systems is first shown through a new circuit-level building block called the Swizzle-Switch. A 32nm Swizzle-Switch utilizes integrated arbitration techniques to provides an energy- and area-efficient switching element that improves the scalability of crossbars to a high radices. The Swizzle-Switch is shown to operate at frequencies up to 1.5GHz for 128-bit, radix-64 crossbars and also to have the ability to implement many arbitration policies such as Least-Recently Granted (LRG) and Round-Robin (RR). Results show that Swizzle-Switch' s LRG arbitration policy reduces the worst-case request access latency by 1.83× and 2.03× on average over round robin and random arbitration schemes, respectively.
This work then shows how a many-core system called the Swizzle-Switch Network can use the Swizzle-Switch as the central building block for a flat crossbar interconnect. The Swizzle-Switch Network is shown to be advantageous to traditional Network-on-Chip (NoC) for systems up to 64 cores. The Swizzle-Switch Network improves system performance by 21%, reduces L1 on-chip average miss latency by 2.2×, and decreases the standard deviation of that L1 miss latency by 3.0× relative to a Mesh NoC topology. Additionally, all of these performance benefits are obtained while providing a 25% energy savings over the Mesh.
The Swizzle-Switch is also leveraged as a building block for high-radix NoC topologies that can support many-core architectures. The Swizzle-Switch- based Flattened Butterfly topology is demonstrated to provide a 15% speedup, 1.76× smaller L1 on-chip average miss latency, 2.5× reduction in miss latency standard deviation, and 10% energy savings over the Mesh topology.
Finally, the impact that 3D stacking technology has on many-core scalability is evaluated and shown to assist crossbar and bus interconnects in scaling past their traditional limitations. A 3D-optimized Swizzle-Switch Network is able to leverage frequency gains to achieve a 15-28% speedup over a 2D- Swizzle-Switch Network when using memory-intensive benchmarks. Additionally, a bus-based 64-core architecture is shown to provide an average speedup of 49× over a baseline uniprocessor system when using 3D technology.
Cited By
- Daya B, Chen C, Subramanian S, Kwon W, Park S, Krishna T, Holt J, Chandrakasan A and Peh L (2014). SCORPIO, ACM SIGARCH Computer Architecture News, 42:3, (25-36), Online publication date: 16-Oct-2014.
- Daya B, Chen C, Subramanian S, Kwon W, Park S, Krishna T, Holt J, Chandrakasan A and Peh L SCORPIO Proceeding of the 41st annual international symposium on Computer architecuture, (25-36)
Recommendations
Adapting the Hyper-Ring Interconnect for Many-Core Processors
ISPA '08: Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with ApplicationsThis paper makes the case for the Hyper-Ring as the interconnect or NoC for many-cores. While other prominent candidates for many-core interconnect such as the torus and mesh have superior bisection bandwidth to the HR, their cost, number of links and ...
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures
Multiprocessor system-on-chip (MP-SoC) platforms are emerging as an important trend for SoC design. Power and wire design constraints are forcing the adoption of new design methodologies for system-on-chip (SoC), namely, those that incorporate ...