research-article

Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA

Authors:
Thiem Van Chu

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

,
Shimpei Sato

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

,
Kenji Kise

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 10 Issue 4Article No.: 27pp 1–27https://doi.org/10.1145/3151758

Published:13 December 2017Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Modeling and simulation/emulation play a major role in research and development of novel Networks-on-Chip (NoCs). However, conventional software simulators are so slow that studying NoCs for emerging many-core systems with hundreds to thousands of cores is challenging. State-of-the-art FPGA-based NoC emulators have shown great potential in speeding up the NoC simulation, but they cannot emulate large-scale NoCs due to the FPGA capacity constraints. Moreover, emulating large-scale NoCs under synthetic workloads on FPGAs typically requires a large amount of memory and thus involves the use of off-chip memory, which makes the overall design much more complicated and may substantially degrade the emulation speed. This article presents methods for fast and cycle-accurate emulation of NoCs with up to thousands of nodes using a single FPGA. We first describe how to emulate a NoC under a synthetic workload using only FPGA on-chip memory (BRAMs). We next present a novel use of time-division multiplexing where BRAMs are effectively used for emulating a network using a small number of nodes, thereby overcoming the FPGA capacity constraints. We propose methods for emulating both direct and indirect networks, focusing on the commonly used meshes and fat-trees (k-ary n-trees). This is different from prior work that considers only direct networks. Using the proposed methods, we build a NoC emulator, called FNoC, and demonstrate the emulation of some mesh-based and fat-tree-based NoCs with canonical router architectures. Our evaluation results show that (1) the size of the largest NoC that can be emulated depends on only the FPGA on-chip memory capacity; (2) a mesh-based NoC with 16,384 nodes (128×128 NoC) and a fat-tree-based NoC with 6,144 switch nodes and 4,096 terminal nodes (4-ary 6-tree NoC) can be emulated using a single Virtex-7 FPGA; and (3) when emulating these two NoCs, we achieve, respectively, 5,047× and 232× speedups over BookSim, one of the most widely used software-based NoC simulators, while maintaining the same level of accuracy.

References

S. Abba and J. Lee. 2014. A parametric-based performance evaluation and design trade-offs for interconnect architectures using FPGAs for networks-on-chip. Microprocess. Microsyst. 38, 5 (2014), 375--398. Google ScholarDigital Library
Access IC Lab. 2017. Access Noxim. Retrieved from http://access.ee.ntu.edu.tw/noxim/index.html.Google Scholar
N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS. 33--42.Google Scholar
M. Badr and N. E. Jerger. 2014. SynFull: Synthetic traffic models capturing cache coherent behaviour. In ISCA. 109--120. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. ACM SIGARCH Comp. Arch. News 39, 2 (2011), 1--7. Google ScholarDigital Library
V. Catania, A. Mineo, S. Monteleone, M. Palesi, and D. Patti. 2016. Cycle-accurate network on chip simulation with noxim. ACM Trans. Model. Comput. Simul. 27, 1 (2016), 4:1--4:25. Google ScholarDigital Library
G. M. Chiu. 2000. The odd-even turn model for adaptive routing. IEEE Trans. Parallel Distrib. Syst. 11, 7 (2000), 729--738. Google ScholarDigital Library
T. V. Chu, S. Sato, and K. Kise. 2015a. Enabling fast and accurate emulation of large-scale network on chip architectures on a single FPGA. In FCCM. 60--63. Google ScholarDigital Library
T. V. Chu, S. Sato, and K. Kise. 2015b. Ultra-fast NoC emulation on a single FPGA. In FPL. 1--8.Google Scholar
E. S. Chung. 2011. CoRAM: An In-Fabric Memory Architecture for FPGA-Based Computing. Ph.D. Dissertation. CMU. Google ScholarDigital Library
CMU-SAFARI. 2017. NOCulator. Retreived from https://github.com/CMU-SAFARI/NOCulator.Google Scholar
W. J. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers. Google ScholarDigital Library
J. Hu and R. Marculescu. 2004. DyAD: Smart routing for networks-on-chip. In DAC. 260--263. Google ScholarDigital Library
N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. J. Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In ISPASS. 86--96.Google Scholar
H. M. Kamali and S. Hessabi. 2016. AdapNoC: A fast and flexible FPGA-based NoC simulator. In FPL. 1--8.Google Scholar
A. Khan, M. Vijayaraghavan, S. Boyd-Wickizer, and Arvind. 2012. Fast and cycle-accurate modeling of a multicore processor. In ISPASS. 178--187. Google ScholarDigital Library
A. I. Khan. 2013. Cycle-Accurate Modeling of Multicore Processors on FPGAs. Ph.D. Dissertation. MIT. Google ScholarDigital Library
M. A. Kinsy, M. Pellauer, and S. Devadas. 2013. Heracles: A tool for fast RTL-based design space exploration of multicore processors. In FPGA. 125--134. Google ScholarDigital Library
D. E. Knuth. 1997. The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley Longman Publishing Co., Inc. Google ScholarDigital Library
Y. E. Krasteva, F. Criado, E. de la Torre, and T. Riesgo. 2008. A fast emulation-based NoC prototyping framework. In ReConFig. 211--216. Google ScholarDigital Library
S. Lotlikar, V. Pai, and P. V. Gratz. 2011. AcENoCs: A configurable HW/SW platform for FPGA accelerated NoC emulation. In VLSID. 147--152. Google ScholarDigital Library
M. K. Papamichael. 2011. Fast scalable FPGA-based network-on-chip simulation models. In MEMOCODE. 77--82. Google ScholarDigital Library
M. K. Papamichael, J. C. Hoe, and O. Mutlu. 2011. FIST: A fast, lightweight, FPGA-friendly packet latency estimator for NoC modeling in full-system simulations. In NOCS. 137--144. Google ScholarDigital Library
A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In DAC. 1050--1055. Google ScholarDigital Library
M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. 2011. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In HPCA. 406--417. Google ScholarDigital Library
M. Pellauer, M. Vijayaraghavan, M. Adler, Arvind, and J. Emer. 2008. A-ports: An efficient abstraction for cycle-accurate performance models on FPGAs. In FPGA. 87--96. Google ScholarDigital Library
P. Ren, M. Lis, M. H. Cho, K. S. Shim, C. W. Fletcher, O. Khan, N. Zheng, and S. Devadas. 2012. HORNET: A cycle-level multicore simulator. IEEE Trans. Comput.-Aid. Des. 31, 6 (2012), 890--903. Google ScholarDigital Library
D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In ISCA. 475--486. Google ScholarDigital Library
L. Shannon, V. Cojocaru, C. N. Dao, and P. H. W. Leong. 2015. Technology scaling in FPGAs: Trends in applications and architectures. In FCCM. 1--8. Google ScholarDigital Library
Z. Tan, A. Waterman, R. Avizienis, Y. Lee, H. Cook, D. Patterson, and K. Asanović. 2010. RAMP gold: An FPGA-based architecture simulator for multiprocessors. In DAC. 463--468. Google ScholarDigital Library
S. Vigna. 2017. Further scramblings of Marsaglia’s xorshift generators. Journal of Computational and Applied Mathematics 315 (2017), 175–181. Google ScholarDigital Library
D. Wang, C. Lo, J. Vasiljevic, N. E. Jerger, and J. Gregory Steffan. 2014. DART: A programmable architecture for NoC simulation on FPGAs. IEEE Trans. Comput. 63, 3 (2014), 664--678. Google ScholarDigital Library
J. Wang, Y. Huang, M. Ebrahimi, L. Huang, Q. Li, A. Jantsch, and G. Li. 2016. VisualNoC: A visualization and evaluation environment for simulation and mapping. In MES. 18--25. Google ScholarDigital Library
J. Wawrzynek, D. Patterson, M. Oskin, S. L. Lu, C. Kozyrakis, J. Hoe, D. Chiou, and K. Asanovic. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2 (2007), 46--57. Google ScholarDigital Library
P. T. Wolkotte, P. K. F. Holzenspies, and G. J. M. Smit. 2007. Fast, accurate and detailed NoC simulations. In NOCS. 323--332. Google ScholarDigital Library

Index Terms

Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
2. Computing methodologies
  1. Modeling and simulation

Recommendations

3D NoC emulation model on a single FPGA
SLIP '20: Proceedings of the Workshop on System-Level Interconnect: Problems and Pathfinding Workshop

Networks-on-Chip (NoCs) have emerged as a promising solution for the communication crisis in large and highly interconnected Systems-on-Chip. To allow investigating path finding solutions for NoC architectures and provide useful insights into the ...
Read More
P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA
Abstract
The network-on-chip (NoC) has emerged as an efficient and scalable communication fabric for chip multiprocessors (CMPs) and multiprocessor system on chips (MPSoCs). The NoC architecture, the routers micro-architecture and links influence the ...
Read More
Enabling Fast and Accurate Emulation of Large-Scale Network on Chip Architectures on a Single FPGA
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines

Network on Chip (NoC) has become the de facto on-chip communication architecture of many-core systems. This paper proposes an FPGA-based NoC emulator which can achieve an ultra-fast simulation speed. We improve the scalability of the NoC emulator ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 10, Issue 4
December 2017
119 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3166118
Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 December 2017
- Accepted: 1 July 2017
- Revised: 1 April 2017
- Received: 1 October 2016
Published in trets Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Emulation
FPGA
many-core
network-on-chip
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 295
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

3D NoC emulation model on a single FPGA

P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA

Enabling Fast and Accurate Emulation of Large-Scale Network on Chip Architectures on a Single FPGA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

3D NoC emulation model on a single FPGA

P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA

Enabling Fast and Accurate Emulation of Large-Scale Network on Chip Architectures on a Single FPGA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media