skip to main content
10.1145/2485922.2485935acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Triggered instructions: a control paradigm for spatially-programmed architectures

Published:23 June 2013Publication History

ABSTRACT

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture.

Our analysis shows that a triggered-instruction based spatial accelerator can achieve 8X greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style spatial baseline, resulting in a speedup of 2.0X.

References

  1. Arvind and R. S. Nikhil. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Transactions on Computers, 39(3):300--318, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. 2006.Google ScholarGoogle Scholar
  3. Bluespec, Inc. Bluespec System Verilog Reference Guide. 2007.Google ScholarGoogle Scholar
  4. D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. Scaling to the End of Silicon with EDGE Architectures. Computer, 37(7):44--55, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. M. Chandy and J. Misra. Parallel Program Design: a Foundation. Addison-Wesley, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Compton and S. Hauck. Reconfigurable Computing: A Survey Of Systems and Software. ACM Computer Survey, 34(2):171--210, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. B. Dennis and D. P. Misunas. A Preliminary Architecture for a Basic Data-Flow Processor. In Proceedings of the 2nd annual Symposium on Computer Architecture, pages 126--132, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. W. Dijkstra. Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Communications of the ACM, 18(8):453--457, Aug. 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk, S. Manne, S. S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, and T. Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. S. Emer and D. W. Clark. A Characterization of Processor Performance in the vax-11/780. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA), pages 301--310, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. A. V. D. Geijin and J. Watts. SUMMA: Scalable Universal Matrix Multiplication Algorithm. Technical report, 1997.Google ScholarGoogle Scholar
  12. V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically Specialized Datapaths for Energy Efficient Computing. In Proceedings of 17th International Conference on High Performance Computer Architecture (HPCA), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Hauser and J. Wawrzynek. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 12--21, April 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Hoogerbrugge and H. Corporaal. Transport-Triggering vs. Operation-Triggering. In Lecture Notes in Computer Science 786, Compiler Construction, pages 435--449. Springer-Verlag, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. E. Knuth, J. Morris, and V. R. Pratt. Fast Pattern Matching in Strings. SIAM Journal of Computing, 6(2):323--350, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  16. H. T. Kung. The CMU Warp Processor. In F. A. Matsen and T. Tajima, editors, Supercomputers: Algorithms, Architectures, and Scientific Computation, pages 235--247. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Marquardt, V. Betz, and J. Rose. Speed and Area Tradeoffs in Cluster-Based FPGA Architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(1):84--93, Feb. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Proceedings of 13th International Conference on Field-Programmable Logic and Applications, pages 61--70, Sep. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. D. G. Merrill and A. S. Grimshaw. Revisiting Sorting for GPGPU Stream Architectures. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 545--546, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Mirsky and A. DeHon. MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 157--166, Apr. 1996.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Panesar, D. Towner, A. Duller, A. Gray, and W. Robbins. Deterministic Parallel Processing. International Journal of Parallel Programming, 34(4):323--341, Aug. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, and R. Taylor. PipeRench: A Virtualized Programmable Datapath in 0.18 Micron Technology. In Proceedings of the 2002 IEEE Custom Integrated Circuits Conference, pages 63--66, May 2002.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The WaveScalar Architecture. ACM Transactions on Computer Systems, 25(2):4:1--4:54, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. IEEE Micro, 22(2):25--35, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Truong, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P. Mejia, and B. Baas. A 167-Processor Computational Platform in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 44(4):1130--1144, April 2009.Google ScholarGoogle ScholarCross RefCross Ref
  26. Z.-A. Ye, A. Moshovos, S. Hauck, and P. Banerjee. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), pages 225--235, Jun. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, T. Mohsenin, M. Singh, and B. Baas. An Asynchronous Array of Simple Processors for DSP Applications. In Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, pages 1696--1705, Feb. 2006.Google ScholarGoogle Scholar

Index Terms

  1. Triggered instructions: a control paradigm for spatially-programmed architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
      June 2013
      686 pages
      ISBN:9781450320795
      DOI:10.1145/2485922
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
        ICSA '13
        June 2013
        666 pages
        ISSN:0163-5964
        DOI:10.1145/2508148
        Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader