research-article

Triggered instructions: a control paradigm for spatially-programmed architectures

Authors:
Angshuman Parashar

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Michael Pellauer

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Michael Adler

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Bushra Ahsan

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Neal Crago

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Daniel Lustig

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

,
Vladimir Pavlov

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Antonia Zhai

Intel Corporation, Hudson, MA and University of Minnesota, Minneapolis, MN

Intel Corporation, Hudson, MA and University of Minnesota, Minneapolis, MN
View Profile

,
Mohit Gambhir

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Aamer Jaleel

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Randy Allmon

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Rachid Rayess

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Stephen Maresh

Intel Corporation, Hudson, MA

Intel Corporation, Hudson, MA
View Profile

,
Joel Emer

Intel Corporation, Hudson, MA and CSAIL, MIT, Cambridge, MA

Intel Corporation, Hudson, MA and CSAIL, MIT, Cambridge, MA
View Profile

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer ArchitectureJune 2013Pages 142–153https://doi.org/10.1145/2485922.2485935

Published:23 June 2013Publication History

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 142–153

ABSTRACT

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture.

Our analysis shows that a triggered-instruction based spatial accelerator can achieve 8X greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style spatial baseline, resulting in a speedup of 2.0X.

References

Arvind and R. S. Nikhil. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Transactions on Computers, 39(3):300--318, 1990. Google ScholarDigital Library
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec. 2006.Google Scholar
Bluespec, Inc. Bluespec System Verilog Reference Guide. 2007.Google Scholar
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. Scaling to the End of Silicon with EDGE Architectures. Computer, 37(7):44--55, July 2004. Google ScholarDigital Library
K. M. Chandy and J. Misra. Parallel Program Design: a Foundation. Addison-Wesley, 1988. Google ScholarDigital Library
K. Compton and S. Hauck. Reconfigurable Computing: A Survey Of Systems and Software. ACM Computer Survey, 34(2):171--210, June 2002. Google ScholarDigital Library
J. B. Dennis and D. P. Misunas. A Preliminary Architecture for a Basic Data-Flow Processor. In Proceedings of the 2nd annual Symposium on Computer Architecture, pages 126--132, 1975. Google ScholarDigital Library
E. W. Dijkstra. Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Communications of the ACM, 18(8):453--457, Aug. 1975. Google ScholarDigital Library
J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk, S. Manne, S. S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, and T. Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002. Google ScholarDigital Library
J. S. Emer and D. W. Clark. A Characterization of Processor Performance in the vax-11/780. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA), pages 301--310, 1984. Google ScholarDigital Library
R. A. V. D. Geijin and J. Watts. SUMMA: Scalable Universal Matrix Multiplication Algorithm. Technical report, 1997.Google Scholar
V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically Specialized Datapaths for Energy Efficient Computing. In Proceedings of 17th International Conference on High Performance Computer Architecture (HPCA), 2011. Google ScholarDigital Library
J. Hauser and J. Wawrzynek. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 12--21, April 1997. Google ScholarDigital Library
J. Hoogerbrugge and H. Corporaal. Transport-Triggering vs. Operation-Triggering. In Lecture Notes in Computer Science 786, Compiler Construction, pages 435--449. Springer-Verlag, 1994. Google ScholarDigital Library
D. E. Knuth, J. Morris, and V. R. Pratt. Fast Pattern Matching in Strings. SIAM Journal of Computing, 6(2):323--350, 1977.Google ScholarCross Ref
H. T. Kung. The CMU Warp Processor. In F. A. Matsen and T. Tajima, editors, Supercomputers: Algorithms, Architectures, and Scientific Computation, pages 235--247. 1986. Google ScholarDigital Library
A. Marquardt, V. Betz, and J. Rose. Speed and Area Tradeoffs in Cluster-Based FPGA Architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 8(1):84--93, Feb. 2000. Google ScholarDigital Library
B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Proceedings of 13th International Conference on Field-Programmable Logic and Applications, pages 61--70, Sep. 2003.Google ScholarCross Ref
D. G. Merrill and A. S. Grimshaw. Revisiting Sorting for GPGPU Stream Architectures. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 545--546, 2010. Google ScholarDigital Library
E. Mirsky and A. DeHon. MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pages 157--166, Apr. 1996.Google ScholarCross Ref
G. Panesar, D. Towner, A. Duller, A. Gray, and W. Robbins. Deterministic Parallel Processing. International Journal of Parallel Programming, 34(4):323--341, Aug. 2006. Google ScholarDigital Library
H. Schmit, D. Whelihan, A. Tsai, M. Moe, B. Levine, and R. Taylor. PipeRench: A Virtualized Programmable Datapath in 0.18 Micron Technology. In Proceedings of the 2002 IEEE Custom Integrated Circuits Conference, pages 63--66, May 2002.Google ScholarCross Ref
S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The WaveScalar Architecture. ACM Transactions on Computer Systems, 25(2):4:1--4:54, May 2007. Google ScholarDigital Library
M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. IEEE Micro, 22(2):25--35, 2002. Google ScholarDigital Library
D. Truong, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P. Mejia, and B. Baas. A 167-Processor Computational Platform in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 44(4):1130--1144, April 2009.Google ScholarCross Ref
Z.-A. Ye, A. Moshovos, S. Hauck, and P. Banerjee. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA), pages 225--235, Jun. 2000. Google ScholarDigital Library
Z. Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, T. Mohsenin, M. Singh, and B. Baas. An Asynchronous Array of Simple Processors for DSP Applications. In Solid-State Circuits Conference (ISSCC), Digest of Technical Papers, pages 1696--1705, Feb. 2006.Google Scholar

Index Terms

Triggered instructions: a control paradigm for spatially-programmed architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

Triggered instructions: a control paradigm for spatially-programmed architectures
ICSA '13

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition ...
Read More
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to ...
Read More
Dynamic coalescing for 16-bit instructions

In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
General Chair:
Avi Mendelson
Technion
ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
reconfigurable accelerators
spatial programming
Qualifiers
- research-article
Conference

Acceptance Rates
ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 1,987
  Total Downloads
- Downloads (Last 12 months)166
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Triggered instructions: a control paradigm for spatially-programmed architectures

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Triggered instructions: a control paradigm for spatially-programmed architectures

Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

Dynamic coalescing for 16-bit instructions