Abstract
The recent advent of many-accelerator systems-on-chip (SoC), driven by the need for maximizing throughput and power efficiency, has led to an exponential increase in the hardware/software co-design complexity. The reason of this increase is that the designer has to explore a vast number of architectural parameter combinations for each single accelerator, as well as inter-accelerator configuration combinations under specific area, throughput, and power constraints, given that each accelerator has different computational requirements. In such a case, the design space size explodes. Thus, existing design space exploration (DSE) techniques give poor-quality solutions, as the design space cannot be adequately covered in a fair time. This problem is aggravated by the very long simulation time of the many-accelerator virtual platforms (VPs). This article addresses these design issues by (a) presenting a virtual prototyping solution that decreases the exploration time by enabling the evaluation of multiple configurations per VP simulation and (b) proposing a DSE methodology that efficiently explores the design space of many-accelerator systems. With the use of two fully developed use cases, namely an H.264 decoding server for multiple video streams and a parallelized denoising system for MRI scans, we show that the proposed DSE methodology either leads to Pareto points that dominate over those of a typical DSE scenario or finds new solutions that might not be found by the typical DSE. In addition, the proposed virtual prototyping solution leads to DSE runtime reduction reaching 10 × for H.264 and 5 × for Rician denoise.
- D. Auras, S. Girbal, H. Berry, O. Temam, and S. Yehia. 2010. CMA: Chip multi-accelerator. In Proceedings of the IEEE 8th SASP Conference. 8--15.Google Scholar
- BrainWeb. 2006. BrainWeb: Simulated Brain Database. Retrieved February 11, 2016, from http://brainweb.bic.mni.mcgill.ca/brainweb/.Google Scholar
- A. Bui, K.-T. Cheng, J. Cong, L. Vese, Y.-C. Wang, B. Yuan, and Y. Zou. 2012. Platform characterization for domain-specific computing. In Proceedings of the 17th ASP-DAC Conference. 94--99.Google Scholar
- CatapultC. 2013. Catapult: Product Family Overview. Retrieved February 11, 2016, from http://calypto.com/en/products/catapult/overview.Google Scholar
- Joseph E. Coffland and Andy D. Pimentel. 2003. A software framework for efficient system-level performance evaluation of embedded systems. In Proceedings of the 2003 ACM Symposium on Applied Computing. ACM, New York, NY, 666--671.Google Scholar
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, H. Huang, and G. Reinman. 2013. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In Proceedings of the IEEE ISLPED Conference. 305--310.Google Scholar
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012a. Architecture support for accelerator-rich CMPs. In Proceedings of the 49th ACM/EDAC/IEEE DAC Conference. 843--849.Google Scholar
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012b. CHARM: A composable heterogeneous accelerator-rich microprocessor. In Proceedings of the ACM/IEEE ISLPED Conference. ACM, New York, NY, 379--384.Google Scholar
- H. Cook and K. Skadron. 2008. Predictive design space exploration using genetically programmed response surfaces. In Proceedings of the 45th Annual Design Automation Conference (DAC’08). ACM, New York, NY, 960--965.Google Scholar
- P. Getreuer, M. Tong, and L. A. Vese. 2011. A variational model for the restoration of MR images corrupted by blur and Rician noise. In Proceedings of the 7th International Conference on Advances in Visual Computing (ISVC’11). 686--698.Google Scholar
- M. Gries. 2004. Methods for evaluating and covering the design space during early design development. Integration, the VLSI Journal 38, 2, 131--183.Google Scholar
- Qi Guo, Tianshi Chen, Yunji Chen, Ling Li, and Weiwu Hu. 2013. Microarchitectural design space exploration made fast. Microprocessors and Microsystems 37, 1, 41--51.Google ScholarDigital Library
- G. Hamerly, E. Perelman, J. Lau, and B. Calder. 2005. SimPoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction-Level Parallelism 7.Google Scholar
- E. Ipek, S. A. McKee, K. Singh, R. Caruana, B. R. de Supinski, and M. Schulz. 2008. Efficient architectural design space exploration via predictive modeling. ACM Transactions on Architecture and Code Optimization 4, 4, Article No. 1.Google ScholarDigital Library
- R. Iyer. 2012. Accelerator-rich architectures: Implications, opportunities and challenges. In Proceedings of the 17th ASP-DAC Conference. 106--107.Google ScholarCross Ref
- R. Jahr, H. Calborean, L. Vintan, and T. Ungerer. 2012. Boosting design space explorations with existing or automatically learned knowledge. In Measurement, Modelling, and Evaluation of Computing Systems and Dependability and Fault Tolerance. Lecture Notes in Computer Science, Vol. 7201. Springer, 221--235.Google Scholar
- Kai-Li Lin, Chen-Kang Lo, and Ren-Song Tsay. 2010. Source-level timing annotation for fast and accurate TLM computation model generation. In Proceedings of the 15th ASP-DAC Conference. 235--240.Google Scholar
- T. Okabe, Y. Jin, and B. Sendhoff. 2003. A critical survey of performance indices for multi-objective optimisation. In Proceedings of the 2003 Congress on Evolutionary Computation (CEC’03), Vol. 2. 878--885.Google Scholar
- OVP. 2014. Open Virtual Platforms Web site. Available at http://www.ovpworld.org.Google Scholar
- G. Palermo, C. Silvano, and V. Zaccaria. 2009. ReSPIR: A response surface-based Pareto iterative refinement for application-specific design space exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 12, 1816--1829.Google ScholarDigital Library
- Hector Posadas, Sara Real, and Eugenio Villar. 2011. M3-SCoPE: Performance modeling of multi-processor embedded systems for fast design space exploration. In Multi-Objective Design Space Exploration of Multiprocessor SoC Architectures, C. Silvano, W. Fornaciari, and E. Villar (Eds.). Springer, New York, NY, 19--50.Google Scholar
- C. Silvano, W. Fornaciari, G. Palermo, V. Zaccaria, F. Castro, M. Martinez, S. Bocchio, R. Zafalon, P. Avasare, G. Vanmeerbeeck, C. Ykman-Couvreur, M. Wouters, C. Kavka, L. Onesti, A. Turco, U. Bondi, G. Mariani, H. Posadas, E. Villar, C. Wu, F. Dongrui, Z. Hao, and T. Shibin. 2011. MULTICUBE: Multi-objective design space exploration of multi-core architectures. In Proceedings of the VLSI 2010 Annual Symposium. 47--63.Google Scholar
- SoCLib. 2011. SoCLib TLM2.0 Library Web Site. Available at http://www.soclib.fr.Google Scholar
- E. Sotiriou-Xanthopoulos, K. Siozios, G. Economakos, and D. Soudris. 2013. A process-based reconfigurable SystemC module for simulation speedup. In Proceedings of the SAMOS XIII International Conference. 72--79.Google Scholar
- E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos, and D. Soudris. 2014. Effective platform-level exploration for heterogeneous multicores exploiting simulation-induced slacks. In Proceedings of the PARMA-DITAM Conference. ACM, New York, NY, Article No. 13.Google Scholar
- J. Teich. 2012. Hardware/software codesign: The past, the present, and predicting the future. Proceedings of the IEEE 100, Special Centennial Issue, 1411--1430.Google ScholarCross Ref
- Valgrind. 2014. Valgrind Tool Suite Web Site. Available at http://valgrind.org/.Google Scholar
- S. Xydis, G. Palermo, V. Zaccaria, and C. Silvano. 2013. A meta-model assisted coprocessor synthesis framework for compiler/architecture parameters customization. In Proceedings of the DATE Conference. 659--664.Google Scholar
- Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele. 2000. Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computation 8, 2, 173--195.Google ScholarDigital Library
Index Terms
- An Integrated Exploration and Virtual Platform Framework for Many-Accelerator Heterogeneous Systems
Recommendations
OpenCL-based Virtual Prototyping and Simulation of Many-Accelerator Architectures
Heterogeneous architectures featuring multiple hardware accelerators have been proposed as a promising solution for meeting the ever-increasing performance and power requirements of embedded systems. However, the existence of numerous design parameters ...
A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis
Special Issue on VIPES, Special Issue on ICESS2015 and Regular PapersMany-accelerator Systems-on-Chip (SoC) have recently emerged as a promising platform paradigm that combines parallelization with heterogeneity, in order to cover the increasing demands for high performance and energy efficiency. To exploit the full ...
SystemC-based electronic system-level design space exploration environment for dedicated heterogeneous multi-processor systems
AbstractThis work faces the problem of the Electronic System-Level (ESL) HW/SW co-design of dedicated electronic digital systems based on heterogeneous multi-processor architectures. In particular, the work presents a prototype SystemC-based ...
Comments