ABSTRACT
The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.
- S. Abraham, B. Rau, R. Schreiber, G. Snider, and M. Schlansker. Efficient design space exploration in PICO. Tech. report, HP Labs, 1999]]Google Scholar
- J. Babb, M. Rinard, A. Moritz, W. Lee, M. Frank, R. Barua and S. Amarasinghe. Parallelizing Applications into Silicon. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'99), 1999]] Google ScholarDigital Library
- R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A compiler-managed memory system for raw machines. In Proc. of the 26th Intl. Symp. on Computer Architecture (ISCA'99), 1999]] Google ScholarDigital Library
- D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In Proc. of the ACM Conference on Program Language Design and Implementation (PLDI'90), pages 53--65, 1990]] Google ScholarDigital Library
- S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 15(3):400--462, July 1994]] Google ScholarDigital Library
- Altera Corp. APEX II programmable logic device data sheets. 2001]]Google Scholar
- D. Cronquist, P. Franklin, and C. Ebeling. Specifying and compiling applications for RaPiD. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'98), pages 116--125, 1998]] Google ScholarDigital Library
- S. Derrien and S. Rajopadhye. Loop tiling for reconfigurable accelerators. In Proc. of the Eleventh Intl. Symp. on Field Programmable Logic (FPL'2001), 2001]] Google ScholarDigital Library
- P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler. Bridging the gap between compilation and synthesis in the DEFACTO system. In Proc. of the Forteenth Workshop on Languages and Compilers for Parallel Computing (LCPC'2001), August 2001. To be published as Lecture Notes in Computer Science]]Google Scholar
- J. P. Elliott. UnderStanding Behavioral Synthesis: A Practical Guide to High-Level Design. 1999]] Google ScholarDigital Library
- J. Frigo, M. Gokhale, and D. Lavenier. Evaluation of the Streams-C C-to-FPGA compiler: an applications perspective. In Proc. of the ACM Symp. on Field Programmable Gate Arrays (FPGA'2002), 2001]] Google ScholarDigital Library
- S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. Taylor, and R. Laufer. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. of the 26th Intl. Symp. on Computer Architecture (ISCA'99), 1999]] Google ScholarDigital Library
- Annapolis~MicroSystems WildStar™ manual, 4.0. 1999]]Google Scholar
- Mentor Graphics Monet™ user's manual (release r42). 1999]]Google Scholar
- XILINX Virtex-II 1.5V FPGA data sheet. ds031(v1.7). 2001]]Google Scholar
- D. Knapp. Behavioral Synthesis. Prentice-Hall, 1996]]Google Scholar
- D. Kulkarni, W. Najjar, R. Rinker, and F. Kurdahi. Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems. In Proc. of the IEEE Symp. on FPGAs for Custom Computing Machines (FCCM'2002), 2002]] Google ScholarDigital Library
- M. Leong, O. Cheung, K. Tsoi, and P. Leong. A bit-serial implementation of the international data encryption algorithm IDEA. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'98), pages 122--131, 1998]] Google ScholarDigital Library
- Y. Li, T. Callahan, E. Darnell, R.E. Harr, U. Kurkure, and J. Stockwood. Hardware-software co-design of embedded reconfigurable architectures. In Proc. of the Design Automation Conference (DAC '00), June, 2000]] Google ScholarDigital Library
- W. Luk, D. Ferguson, and I. Page. Structured hardware compilation of parallel programs. Abingdon EE &CS Books, 1994]]Google Scholar
- I. Page and W. Luk. Compiling OCCAM into FPGAs. In Proc. of the First Intl. Symp. on Field Programmable Logic (FPL'91), 1991]]Google Scholar
- J. Proakis and D. G. Manolakis. Digital Signal Processing: Principles, Algorithms and Applications. Prentice-Hall, 1995]] Google ScholarDigital Library
- R. Rinker, M. Carter, A. Patel, M.Chawathe, C. Ross, J. Hammes, W. Najjar, and W. Bohm. An automated process for compiling dataflow graphs into reconfigurable hardware. IEEE Trans. on VLSI Systems, 9(1):130--139, 2001]] Google ScholarDigital Library
- M. Weinhardt. Compilation and pipeline synthesis for reconfigurable architectures. In Proc. of the 1997 Reconfigurable Architectures Workshop RAW'97. Springer-Verlag, 1997]]Google Scholar
- M. Wolfe. Optimizing Supercompilers for Supercomputers. Addison-Wesley, 1996]] Google ScholarDigital Library
- H. Ziegler, B. So, M. Hall, and P. Diniz. Coarse-Grain Pipelining for Multiple FPGA Architectures. In Proc. of the IEEE Symp. on FPGA for Custom Computing Machines (FCCM'02), 2002]] Google ScholarDigital Library
Index Terms
- A compiler approach to fast hardware design space exploration in FPGA-based systems
Recommendations
A compiler approach to fast hardware design space exploration in FPGA-based systems
The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such ...
Accelerating FPGA Prototyping through Predictive Model-Based HLS Design Space Exploration
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019One of the advantages of High-Level Synthesis (HLS), also called C-based VLSI-design, over traditional RT-level VLSI design flows, is that multiple micro-architectures of unique area vs. performance can be automatically generated by setting different ...
A constructive approach for design space exploration
EICS '13: Proceedings of the 5th ACM SIGCHI symposium on Engineering interactive computing systemsThe co-evolution of different kinds of external representations is essential in Human-Centered Design. It helps design teams to interleave different design activities and to view a design problem from different perspectives. The paper investigates a ...
Comments