skip to main content
10.1145/1289816.1289841acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Thread warping: a framework for dynamic synthesis of thread accelerators

Published:30 September 2007Publication History

ABSTRACT

We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable gate arrays). Building on dynamic synthesis for single-processor single-thread systems, known as warp processing, thread warping improves performances of multiprocessor systems by speeding up individual threads and by allowing more threads to execute concurrently. Furthermore, thread warping maintains the important separation of function from architecture, enabling portability of applications to architectures with different quantities of microprocessors and FPGA.an advantage not shared by static compilation/synthesis approaches. We introduce a framework of architecture, CAD tools, and operating system that together support thread warping. We summarize experiments on an extensive architectural simulation framework we developed, showing application speedups of 4x to 502x, averaging 130x compared to a multiprocessor system having four ARM11 microprocessors, for eight benchmark applications. Even compared to a 64-processor system, thread warping achieves 11x speedup.

References

  1. Amerson, R., Carter, R., Culbertson, W., Kuekes, P., Snider, G., and Albertson, L. Plasma: an FPGA for million gate systems. In Proceedings of Int. Symp. on Field Programmable Gate Arrays (FPGA), 1996, 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrews, D., Niehaus, D., and Ashenden, P. Programming models for hybrid CPU/FPGA chips. IEEE Computer, 37, 1 (2004), 118--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Burger, D. and Austin, T. The simplescalar tool set, version 2.0. SIGARCH Computer Architecture News, 25, 3 (1997), 13--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cifuentes, C. Reverse Compilation Techniques. PhD Thesis, Queensland University of Technology, 1994.Google ScholarGoogle Scholar
  5. Cray XD1. http://www.cray.com/products/xd1, 2005.Google ScholarGoogle Scholar
  6. Dellson, A., Sandberg, G., and Möhl, S. Turning FPGAs into Supercomputers. Cray User Group, 2006.Google ScholarGoogle Scholar
  7. Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. System level hardware/software partitioning based on simulated annealing and tabu search. Journal on Design Automation for Embedded Systems (DAES), Springer, 2, 1 (1997), 5--32.Google ScholarGoogle Scholar
  8. Fin, A., Fummi, F., and Signoretto, M. SystemC: a homogenous environment to test embedded systems. In Proceedings of Int. Workshop on Hardware/Software Codesign (CODES), 2001, 17--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Grimpe, E. and Oppenheimer, F. Extending the SystemC synthesis subset by object oriented features. In Proceedings of Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), 2003, 25--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guo, Z., Buyukkurt, A.B., and Najjar, W. Input data reuse in compiling window operations onto reconfigurable hardware. In Proceedings of Symposium on Languages, Compilers and Tools for Embedded Systems (LCTES), 2004, 249--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gupta, S., Dutt, N., Gupta, R., and Nicolau, A. SPARK : a high-level synthesis framework for applying parallelizing compiler transformations. In Proceedings of Int. Conf. on VLSI Design, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hill, M., Larus, J., Lebeck, A., Talluri, M., and Wood, D. Wisconsin architectural research tool set. SIGARCH Computer Architecture News. 21, 4 (1993). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. IBM. The Cell Architecture. http://domino.research.ibm.com, 2006.Google ScholarGoogle Scholar
  14. Schleupen, K., Lekuch, S., Mannion, R., Guo, Z., Najjar, W., and Vahid, F. Dynamic partial FPGA reconfiguration in a prototype microprocessor system. In Proceedings of Int. Conf. on Field Programmable Logic And Applications, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. Intel Quad-Core Xeon. http://www.intel.com, 2007.Google ScholarGoogle Scholar
  16. Jung, H. and Ha, S. Hardware synthesis from coarse-grained dataflow specification for fast hw/sw cosynthesis. In Proceedings of Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), 2004, 24--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Koch, D., Haubelt, C., and Teich, J. Efficient hardware checkpointing: concepts, overhead analysis, and implementation. In Proceedings of Int. Symp. on Field Programmable Gate Arrays (FPGA), 2007, 188--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. LaPedus. Intel Tips Teraflops Programmable Processor. EE Times, September 2006.Google ScholarGoogle Scholar
  19. Lu, J., Chen, H., Yew, P., and Hsu, W. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6 (Jun 2004), 1--24.Google ScholarGoogle Scholar
  20. Ludwig, S. Fast Hardware Synthesis Tools and a Reconfigurable Coprocessor. Ph.D. Thesis, ETH Zurich, 2005.Google ScholarGoogle Scholar
  21. Lysecky, R., Stitt, G., and Vahid, F. Warp processors. ACM Transactions on Design Automation of Electronic Systems (TODAES), 11, 3 (2006), 659--681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lysecky, R., Vahid, F., and Tan, S. A study of the scalability of on-chip routing for just-in-time FPGA compilation. In Proceedings of IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM), 2005, 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mittal, G., Zaretsky, D., Tang, X., and Banerjee, P. Automatic translation of software binaries onto FPGAs. In Proceedings of ACM Design Automation Conference (DAC), 2004, 389--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. De Micheli, G. Synthesis and Optimization of Digital Circuits. McGraw-Hill, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rakhmatov, D. and Vrudhula, S. Hardware-software bipartitioning for dynamically reconfigurable systems. In Proceedings of Int. Workshop on Hardware/Software Co-Design (CODES), 2002, 145--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SGI Altix. http://www.sgi.com/products/servers/altix/Google ScholarGoogle Scholar
  27. Stitt, G. and Vahid, F. New decompilation techniques for binary-level co-processor generation. In Proceedings of IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), 2005, 547--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. VxWorks RTOS. http://www.windriver.com/vxworks/, 2007.Google ScholarGoogle Scholar
  29. Xilinx Virtex II Pro, http://www.xilinx.com, 2006.Google ScholarGoogle Scholar
  30. Xilinx Virtex IV, http://www.xilinx.com, 2006.Google ScholarGoogle Scholar
  31. Zhang, W., Calder, B., and Tullsen, D. An event-driven multithreaded dynamic optimization framework. In Proceedings of Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2005, 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Thread warping: a framework for dynamic synthesis of thread accelerators

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
        September 2007
        284 pages
        ISBN:9781595938244
        DOI:10.1145/1289816

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 September 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate280of864submissions,32%

        Upcoming Conference

        ESWEEK '24
        Twentieth Embedded Systems Week
        September 29 - October 4, 2024
        Raleigh , NC , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader