Article

Thread warping: a framework for dynamic synthesis of thread accelerators

Authors:
Greg Stitt

University of Florida, Gainesville, FL

University of Florida, Gainesville, FL
View Profile

,
Frank Vahid

University of California: Riverside, Riverside, CA

University of California: Riverside, Riverside, CA
View Profile

CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesisSeptember 2007Pages 93–98https://doi.org/10.1145/1289816.1289841

Published:30 September 2007Publication History

CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis

Pages 93–98

ABSTRACT

We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable gate arrays). Building on dynamic synthesis for single-processor single-thread systems, known as warp processing, thread warping improves performances of multiprocessor systems by speeding up individual threads and by allowing more threads to execute concurrently. Furthermore, thread warping maintains the important separation of function from architecture, enabling portability of applications to architectures with different quantities of microprocessors and FPGA.an advantage not shared by static compilation/synthesis approaches. We introduce a framework of architecture, CAD tools, and operating system that together support thread warping. We summarize experiments on an extensive architectural simulation framework we developed, showing application speedups of 4x to 502x, averaging 130x compared to a multiprocessor system having four ARM11 microprocessors, for eight benchmark applications. Even compared to a 64-processor system, thread warping achieves 11x speedup.

References

Amerson, R., Carter, R., Culbertson, W., Kuekes, P., Snider, G., and Albertson, L. Plasma: an FPGA for million gate systems. In Proceedings of Int. Symp. on Field Programmable Gate Arrays (FPGA), 1996, 10--16. Google ScholarDigital Library
Andrews, D., Niehaus, D., and Ashenden, P. Programming models for hybrid CPU/FPGA chips. IEEE Computer, 37, 1 (2004), 118--120. Google ScholarDigital Library
Burger, D. and Austin, T. The simplescalar tool set, version 2.0. SIGARCH Computer Architecture News, 25, 3 (1997), 13--35. Google ScholarDigital Library
Cifuentes, C. Reverse Compilation Techniques. PhD Thesis, Queensland University of Technology, 1994.Google Scholar
Cray XD1. http://www.cray.com/products/xd1, 2005.Google Scholar
Dellson, A., Sandberg, G., and Möhl, S. Turning FPGAs into Supercomputers. Cray User Group, 2006.Google Scholar
Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. System level hardware/software partitioning based on simulated annealing and tabu search. Journal on Design Automation for Embedded Systems (DAES), Springer, 2, 1 (1997), 5--32.Google Scholar
Fin, A., Fummi, F., and Signoretto, M. SystemC: a homogenous environment to test embedded systems. In Proceedings of Int. Workshop on Hardware/Software Codesign (CODES), 2001, 17--22. Google ScholarDigital Library
Grimpe, E. and Oppenheimer, F. Extending the SystemC synthesis subset by object oriented features. In Proceedings of Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), 2003, 25--30. Google ScholarDigital Library
Guo, Z., Buyukkurt, A.B., and Najjar, W. Input data reuse in compiling window operations onto reconfigurable hardware. In Proceedings of Symposium on Languages, Compilers and Tools for Embedded Systems (LCTES), 2004, 249--256. Google ScholarDigital Library
Gupta, S., Dutt, N., Gupta, R., and Nicolau, A. SPARK : a high-level synthesis framework for applying parallelizing compiler transformations. In Proceedings of Int. Conf. on VLSI Design, 2003. Google ScholarDigital Library
Hill, M., Larus, J., Lebeck, A., Talluri, M., and Wood, D. Wisconsin architectural research tool set. SIGARCH Computer Architecture News. 21, 4 (1993). Google ScholarDigital Library
IBM. The Cell Architecture. http://domino.research.ibm.com, 2006.Google Scholar
Schleupen, K., Lekuch, S., Mannion, R., Guo, Z., Najjar, W., and Vahid, F. Dynamic partial FPGA reconfiguration in a prototype microprocessor system. In Proceedings of Int. Conf. on Field Programmable Logic And Applications, 2007.Google ScholarCross Ref
Intel Quad-Core Xeon. http://www.intel.com, 2007.Google Scholar
Jung, H. and Ha, S. Hardware synthesis from coarse-grained dataflow specification for fast hw/sw cosynthesis. In Proceedings of Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), 2004, 24--29. Google ScholarDigital Library
Koch, D., Haubelt, C., and Teich, J. Efficient hardware checkpointing: concepts, overhead analysis, and implementation. In Proceedings of Int. Symp. on Field Programmable Gate Arrays (FPGA), 2007, 188--196. Google ScholarDigital Library
M. LaPedus. Intel Tips Teraflops Programmable Processor. EE Times, September 2006.Google Scholar
Lu, J., Chen, H., Yew, P., and Hsu, W. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6 (Jun 2004), 1--24.Google Scholar
Ludwig, S. Fast Hardware Synthesis Tools and a Reconfigurable Coprocessor. Ph.D. Thesis, ETH Zurich, 2005.Google Scholar
Lysecky, R., Stitt, G., and Vahid, F. Warp processors. ACM Transactions on Design Automation of Electronic Systems (TODAES), 11, 3 (2006), 659--681. Google ScholarDigital Library
Lysecky, R., Vahid, F., and Tan, S. A study of the scalability of on-chip routing for just-in-time FPGA compilation. In Proceedings of IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM), 2005, 57--62. Google ScholarDigital Library
Mittal, G., Zaretsky, D., Tang, X., and Banerjee, P. Automatic translation of software binaries onto FPGAs. In Proceedings of ACM Design Automation Conference (DAC), 2004, 389--394. Google ScholarDigital Library
De Micheli, G. Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994. Google ScholarDigital Library
Rakhmatov, D. and Vrudhula, S. Hardware-software bipartitioning for dynamically reconfigurable systems. In Proceedings of Int. Workshop on Hardware/Software Co-Design (CODES), 2002, 145--150. Google ScholarDigital Library
SGI Altix. http://www.sgi.com/products/servers/altix/Google Scholar
Stitt, G. and Vahid, F. New decompilation techniques for binary-level co-processor generation. In Proceedings of IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), 2005, 547--554. Google ScholarDigital Library
VxWorks RTOS. http://www.windriver.com/vxworks/, 2007.Google Scholar
Xilinx Virtex II Pro, http://www.xilinx.com, 2006.Google Scholar
Xilinx Virtex IV, http://www.xilinx.com, 2006.Google Scholar
Zhang, W., Calder, B., and Tullsen, D. An event-driven multithreaded dynamic optimization framework. In Proceedings of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2005, 87--98. Google ScholarDigital Library

Index Terms

Thread warping: a framework for dynamic synthesis of thread accelerators
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

Thread Warping: Dynamic and Transparent Synthesis of Thread Accelerators

We introduce thread warping, a dynamic optimization technique that customizes multicore architectures to a given application by dynamically synthesizing threads into custom accelerator circuits on FPGAs (Field-Programmable Gate Arrays). Thread warping ...
Read More
Warp Processing: Dynamic Translation of Binaries to FPGA Circuits

Warp processing dynamically and transparently transforms an executing microprocessor's binary kernels into customized field-programmable gate array (FPGA) circuits, commonly resulting in 2X to 100X speedup over executing on microprocessors. A new ...
Read More
An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
September 2007
284 pages
ISBN:9781595938244
DOI:10.1145/1289816
General Chairs:
Soonhoi Ha
Seoul National University, Korea
,
Kiyoung Choi
Seoul National University, Korea
,
Program Chairs:
Nikil Dutt
UC Irvine, USA
,
Jürgen Teich
University of Erlangen-Nuremberg, Germany
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
dynamic synthesis
just-in-time compilation
multi-core
synthesis
thread warping
threads
warp processing
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate280of864submissions,32%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 405
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Thread warping: a framework for dynamic synthesis of thread accelerators

CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Thread Warping: Dynamic and Transparent Synthesis of Thread Accelerators

Warp Processing: Dynamic Translation of Binaries to FPGA Circuits

An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor