Abstract
Modeling and simulation/emulation play a major role in research and development of novel Networks-on-Chip (NoCs). However, conventional software simulators are so slow that studying NoCs for emerging many-core systems with hundreds to thousands of cores is challenging. State-of-the-art FPGA-based NoC emulators have shown great potential in speeding up the NoC simulation, but they cannot emulate large-scale NoCs due to the FPGA capacity constraints. Moreover, emulating large-scale NoCs under synthetic workloads on FPGAs typically requires a large amount of memory and thus involves the use of off-chip memory, which makes the overall design much more complicated and may substantially degrade the emulation speed. This article presents methods for fast and cycle-accurate emulation of NoCs with up to thousands of nodes using a single FPGA. We first describe how to emulate a NoC under a synthetic workload using only FPGA on-chip memory (BRAMs). We next present a novel use of time-division multiplexing where BRAMs are effectively used for emulating a network using a small number of nodes, thereby overcoming the FPGA capacity constraints. We propose methods for emulating both direct and indirect networks, focusing on the commonly used meshes and fat-trees (k-ary n-trees). This is different from prior work that considers only direct networks. Using the proposed methods, we build a NoC emulator, called FNoC, and demonstrate the emulation of some mesh-based and fat-tree-based NoCs with canonical router architectures. Our evaluation results show that (1) the size of the largest NoC that can be emulated depends on only the FPGA on-chip memory capacity; (2) a mesh-based NoC with 16,384 nodes (128×128 NoC) and a fat-tree-based NoC with 6,144 switch nodes and 4,096 terminal nodes (4-ary 6-tree NoC) can be emulated using a single Virtex-7 FPGA; and (3) when emulating these two NoCs, we achieve, respectively, 5,047× and 232× speedups over BookSim, one of the most widely used software-based NoC simulators, while maintaining the same level of accuracy.
- S. Abba and J. Lee. 2014. A parametric-based performance evaluation and design trade-offs for interconnect architectures using FPGAs for networks-on-chip. Microprocess. Microsyst. 38, 5 (2014), 375--398. Google ScholarDigital Library
- Access IC Lab. 2017. Access Noxim. Retrieved from http://access.ee.ntu.edu.tw/noxim/index.html.Google Scholar
- N. Agarwal, T. Krishna, L. S. Peh, and N. K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS. 33--42.Google Scholar
- M. Badr and N. E. Jerger. 2014. SynFull: Synthetic traffic models capturing cache coherent behaviour. In ISCA. 109--120. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. ACM SIGARCH Comp. Arch. News 39, 2 (2011), 1--7. Google ScholarDigital Library
- V. Catania, A. Mineo, S. Monteleone, M. Palesi, and D. Patti. 2016. Cycle-accurate network on chip simulation with noxim. ACM Trans. Model. Comput. Simul. 27, 1 (2016), 4:1--4:25. Google ScholarDigital Library
- G. M. Chiu. 2000. The odd-even turn model for adaptive routing. IEEE Trans. Parallel Distrib. Syst. 11, 7 (2000), 729--738. Google ScholarDigital Library
- T. V. Chu, S. Sato, and K. Kise. 2015a. Enabling fast and accurate emulation of large-scale network on chip architectures on a single FPGA. In FCCM. 60--63. Google ScholarDigital Library
- T. V. Chu, S. Sato, and K. Kise. 2015b. Ultra-fast NoC emulation on a single FPGA. In FPL. 1--8.Google Scholar
- E. S. Chung. 2011. CoRAM: An In-Fabric Memory Architecture for FPGA-Based Computing. Ph.D. Dissertation. CMU. Google ScholarDigital Library
- CMU-SAFARI. 2017. NOCulator. Retreived from https://github.com/CMU-SAFARI/NOCulator.Google Scholar
- W. J. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers. Google ScholarDigital Library
- J. Hu and R. Marculescu. 2004. DyAD: Smart routing for networks-on-chip. In DAC. 260--263. Google ScholarDigital Library
- N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. J. Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In ISPASS. 86--96.Google Scholar
- H. M. Kamali and S. Hessabi. 2016. AdapNoC: A fast and flexible FPGA-based NoC simulator. In FPL. 1--8.Google Scholar
- A. Khan, M. Vijayaraghavan, S. Boyd-Wickizer, and Arvind. 2012. Fast and cycle-accurate modeling of a multicore processor. In ISPASS. 178--187. Google ScholarDigital Library
- A. I. Khan. 2013. Cycle-Accurate Modeling of Multicore Processors on FPGAs. Ph.D. Dissertation. MIT. Google ScholarDigital Library
- M. A. Kinsy, M. Pellauer, and S. Devadas. 2013. Heracles: A tool for fast RTL-based design space exploration of multicore processors. In FPGA. 125--134. Google ScholarDigital Library
- D. E. Knuth. 1997. The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley Longman Publishing Co., Inc. Google ScholarDigital Library
- Y. E. Krasteva, F. Criado, E. de la Torre, and T. Riesgo. 2008. A fast emulation-based NoC prototyping framework. In ReConFig. 211--216. Google ScholarDigital Library
- S. Lotlikar, V. Pai, and P. V. Gratz. 2011. AcENoCs: A configurable HW/SW platform for FPGA accelerated NoC emulation. In VLSID. 147--152. Google ScholarDigital Library
- M. K. Papamichael. 2011. Fast scalable FPGA-based network-on-chip simulation models. In MEMOCODE. 77--82. Google ScholarDigital Library
- M. K. Papamichael, J. C. Hoe, and O. Mutlu. 2011. FIST: A fast, lightweight, FPGA-friendly packet latency estimator for NoC modeling in full-system simulations. In NOCS. 137--144. Google ScholarDigital Library
- A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In DAC. 1050--1055. Google ScholarDigital Library
- M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. 2011. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In HPCA. 406--417. Google ScholarDigital Library
- M. Pellauer, M. Vijayaraghavan, M. Adler, Arvind, and J. Emer. 2008. A-ports: An efficient abstraction for cycle-accurate performance models on FPGAs. In FPGA. 87--96. Google ScholarDigital Library
- P. Ren, M. Lis, M. H. Cho, K. S. Shim, C. W. Fletcher, O. Khan, N. Zheng, and S. Devadas. 2012. HORNET: A cycle-level multicore simulator. IEEE Trans. Comput.-Aid. Des. 31, 6 (2012), 890--903. Google ScholarDigital Library
- D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In ISCA. 475--486. Google ScholarDigital Library
- L. Shannon, V. Cojocaru, C. N. Dao, and P. H. W. Leong. 2015. Technology scaling in FPGAs: Trends in applications and architectures. In FCCM. 1--8. Google ScholarDigital Library
- Z. Tan, A. Waterman, R. Avizienis, Y. Lee, H. Cook, D. Patterson, and K. Asanović. 2010. RAMP gold: An FPGA-based architecture simulator for multiprocessors. In DAC. 463--468. Google ScholarDigital Library
- S. Vigna. 2017. Further scramblings of Marsaglia’s xorshift generators. Journal of Computational and Applied Mathematics 315 (2017), 175–181. Google ScholarDigital Library
- D. Wang, C. Lo, J. Vasiljevic, N. E. Jerger, and J. Gregory Steffan. 2014. DART: A programmable architecture for NoC simulation on FPGAs. IEEE Trans. Comput. 63, 3 (2014), 664--678. Google ScholarDigital Library
- J. Wang, Y. Huang, M. Ebrahimi, L. Huang, Q. Li, A. Jantsch, and G. Li. 2016. VisualNoC: A visualization and evaluation environment for simulation and mapping. In MES. 18--25. Google ScholarDigital Library
- J. Wawrzynek, D. Patterson, M. Oskin, S. L. Lu, C. Kozyrakis, J. Hoe, D. Chiou, and K. Asanovic. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2 (2007), 46--57. Google ScholarDigital Library
- P. T. Wolkotte, P. K. F. Holzenspies, and G. J. M. Smit. 2007. Fast, accurate and detailed NoC simulations. In NOCS. 323--332. Google ScholarDigital Library
Index Terms
- Fast and Cycle-Accurate Emulation of Large-Scale Networks-on-Chip Using a Single FPGA
Recommendations
3D NoC emulation model on a single FPGA
SLIP '20: Proceedings of the Workshop on System-Level Interconnect: Problems and Pathfinding WorkshopNetworks-on-Chip (NoCs) have emerged as a promising solution for the communication crisis in large and highly interconnected Systems-on-Chip. To allow investigating path finding solutions for NoC architectures and provide useful insights into the ...
P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA
AbstractThe network-on-chip (NoC) has emerged as an efficient and scalable communication fabric for chip multiprocessors (CMPs) and multiprocessor system on chips (MPSoCs). The NoC architecture, the routers micro-architecture and links influence the ...
Enabling Fast and Accurate Emulation of Large-Scale Network on Chip Architectures on a Single FPGA
FCCM '15: Proceedings of the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing MachinesNetwork on Chip (NoC) has become the de facto on-chip communication architecture of many-core systems. This paper proposes an FPGA-based NoC emulator which can achieve an ultra-fast simulation speed. We improve the scalability of the NoC emulator ...
Comments