skip to main content
Exploiting instruction level parallelism in the presence of conditional branches
Publisher:
  • University of Illinois at Urbana-Champaign
  • Champaign, IL
  • United States
Order Number:UMI Order No. GAX97-17305
Bibliometrics
Skip Abstract Section
Abstract

Wide issue superscalar and VLIW processors utilize instruction-level parallelism (ILP) to achieve high performance. However, if insufficient ILP is found, the performance potential of these processors suffers dramatically. Branch instructions, which are one of the major limitations to exploiting ILP, enforce strict ordering conditions in programs to ensure correct execution. Therefore, it is difficult to achieve the desired overlap of instruction execution with branches in the instruction stream. To effectively exploit ILP in the presence of branches requires efficient handling of branches and the dependences they impose.

This dissertation investigates two techniques for exposing and enhancing ILP in the presence of branches, speculative execution and predicated execution. Speculative execution enables an ILP compiler to remove dependences between instructions and prior branches. In this manner, the execution of instructions and predicted future instructions may be overlapped. Compiler-controlled speculative execution is employed using an efficient structure called the superblock. The formation and optimization of superblocks increase ILP along important execution paths by systematically removing constraints due to unimportant paths. In conjunction with superblock optimizations, speculative execution is utilized to remove control dependences in the superblock to aggressively reorder instructions across branches to achieve a high degree of execution overlap.

For many applications, speculative execution alone is not sufficient to achieve high performance. The fundamental limitation is that speculation only removes dependences between branches and other instructions. The branches themselves remain in the code, which causes difficult problems. This motivates the second technique investigated in this dissertation, predicated execution, which is an architectural capability that enables the conditional execution of instructions based on the value of a Boolean source operand. Predicated execution allows a compiler to eliminate branch instructions using this conditional execution support. Additionally, predicated execution provides an efficient interface for the compiler to overlap the execution of multiple paths of control. Predicated execution is exploited in the compiler via a generalized form of a superblock, called the hyperblock. Hyperblocks provide the framework for the compiler to selectively eliminate branches using predicated execution as well as apply speculative execution to exploit ILP.

Cited By

  1. Yin S, Zhou P, Liu L and Wei S Acceleration of Nested Conditionals on CGRAs via Trigger Scheme Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, (597-604)
  2. ACM
    Hamzeh M, Shrivastava A and Vrudhula S Branch-Aware Loop Mapping on CGRAs Proceedings of the 51st Annual Design Automation Conference, (1-6)
  3. Arbelo C, Kanstein A, López S, López J, Berekovic M, Sarmiento R and Mignolet J Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture Proceedings of the conference on Design, automation and test in Europe, (177-182)
  4. Maher B, Smith A, Burger D and McKinley K Merging Head and Tail Duplication for Convergent Hyperblock Formation Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (65-76)
  5. Smith A, Gibson J, Maher B, Nethercote N, Yoder B, Burger D, McKinle K and Burrill J Compiling for EDGE Architectures Proceedings of the International Symposium on Code Generation and Optimization, (185-195)
  6. ACM
    Palkovic M, Corporaal H and Catthoor F Global memory optimisation for embedded systems allowed by code duplication Proceedings of the 2005 workshop on Software and compilers for embedded systems, (72-79)
  7. Shin J, Hall M and Chame J Superword-Level Parallelism in the Presence of Control Flow Proceedings of the international symposium on Code generation and optimization, (165-175)
  8. Lin J, Hsu W, Yew P, Ju R and Ngai T A Compiler Framework for Recovery Code Generation in General Speculative Optimizations Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (17-28)
  9. ACM
    Song L and Kavi K (2004). What can we gain by unfolding loops?, ACM SIGPLAN Notices, 39:2, (26-33), Online publication date: 1-Feb-2004.
  10. ACM
    Stephenson M, Amarasinghe S, Martin M and O'Reilly U Meta optimization Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, (77-90)
  11. ACM
    Stephenson M, Amarasinghe S, Martin M and O'Reilly U (2003). Meta optimization, ACM SIGPLAN Notices, 38:5, (77-90), Online publication date: 9-May-2003.
  12. Stephenson M, O'Reilly U, Martin M and Amarasinghe S Genetic programming applied to compiler heuristic optimization Proceedings of the 6th European conference on Genetic programming, (238-253)
  13. ACM
    Snider G Performance-constrained pipelining of software loops onto reconfigurable hardware Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays, (177-186)
  14. Zhou H, Jennings M and Conte T Tree traversal scheduling Proceedings of the 14th international conference on Languages and compilers for parallel computing, (223-238)
  15. ACM
    Snider G, Shackleford B and Carter R Attacking the semantic gap between application programming languages and configurable hardware Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays, (115-124)
  16. ACM
    Eichenberger A, Meleis W and Maradani S An integrated approach to accelerate data and predicate computations in hyperblocks Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, (101-111)
  17. Park S, Shim S and Moon S Evaluation of scheduling techniques on a SPARC-based VLIW testbed Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, (104-113)
  18. Chekuri C, Johnson R, Motwani R, Natarajan B, Rau B and Schlansker M Profile-driven instruction level parallel scheduling with application to super blocks Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, (58-67)
  19. Yin S, Zhou P, Liu L and Wei S Acceleration of nested conditionals on CGRAs via trigger scheme 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), (597-604)
Contributors
  • University of Michigan, Ann Arbor

Recommendations