ABSTRACT
This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated instructions. To achieve this without incurring large hardware cost and complexity, the compiler provides control-flow information by hints and the processor dynamically predicates instructions only on frequently executed program paths. The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered. Therefore, DMP can dynamically predicate a much larger set of branches than simple hammock branches. This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated instructions. To achieve this without incurring large hardware cost and complexity, the compiler provides control-flow information by hints and the processor dynamically predicates instructions only on frequently executed program paths. The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered. Therefore, DMP can dynamically predicate a much larger set of branches than simple hammock branches.
- {1} P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: opportunities and limits. In ICS-12, 1998. Google ScholarDigital Library
- {2} J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In POPL-10, 1983. Google ScholarDigital Library
- {3} D. I. August, D. A. Connors, J. C. Gyllenhaal, and W. W. Hwu. Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results. In HPCA-3, 1997. Google ScholarDigital Library
- {4} D. I. August, W. W. Hwu, and S. A. Mahlke. A framework for balancing control flow and predication. In MICRO-30, 1997. Google ScholarDigital Library
- {5} D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000. Google ScholarDigital Library
- {6} P.-Y. Chang, E. Hao, Y. N. Patt, and P. P. Chang. Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution. In PACT, 1995. Google ScholarDigital Library
- {7} S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay. High-performance throughput computing. IEEE Micro, 25(3):32-45, May 2005. Google ScholarDigital Library
- {8} C.-Y. Cher and T. N. Vijaykumar. Skipper: a microarchitecture for exploiting control-flow independence. In MICRO-34, 2001. Google ScholarDigital Library
- {9} Y. Choi, A. Knies, L. Gerke, and T.-F. Ngai. The impact of if-conversion and branch prediction on program execution on the Intel Itanium processor. In MICRO-34, 2001. Google ScholarDigital Library
- {10} Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In ISCA-31, 2004. Google ScholarDigital Library
- {11} Y. Chou, J. Fung, and J. P. Shen. Reducing branch misprediction penalties via dynamic control independence detection. In ICS-13, 1999. Google ScholarDigital Library
- {12} J. D. Collins, D. M. Tullsen, and H. Wang. Control flow optimization via dynamic reconvergence prediction. In MICRO-37, 2004. Google ScholarDigital Library
- {13} A. Cristal, O. J. Santana, F. Cazorla, M. Galluzzi, T. Ramirez, M. Pericas, and M. Valero. Kilo-instruction processors: Overcoming the memory wall. IEEE Micro, 25(3):48-57, May 2005. Google ScholarDigital Library
- {14} R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems , 13(4):451-490, Oct. 1991. Google ScholarDigital Library
- {15} M. Farrens, T. Heil, J. E. Smith, and G. Tyson. Restricted dual path execution. Technical Report CSE-97-18, University of California at Davis, Nov. 1997.Google Scholar
- {16} A. Gandhi, H. Akkary, and S. T. Srinivasan. Reducing branch misprediction penalty via selective recovery. In HPCA-10, 2004. Google ScholarDigital Library
- {17} D. Grunwald, A. Klauser, S. Manne, and A. Pleszkun. Confidence estimation for speculation control. In ISCA-25, 1998. Google ScholarDigital Library
- {18} T. Heil and J. E. Smith. Selective dual path execution. Technical report, University of Wisconsin-Madison, Nov. 1996.Google Scholar
- {19} E. Jacobsen, E. Rotenberg, and J. E. Smith. Assigning confidence to conditional branch predictions. In MICRO-29, 1996. Google ScholarDigital Library
- {20} D. A. Jiménez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA-7, 2001. Google ScholarDigital Library
- {21} H. Kim, J. A. Joao, O. Mutlu, and Y. N. Patt. Diverge-merge processor (DMP): Dynamic predicated execution of complex control-flow graphs based on frequently executed paths. Technical Report TR-HPS-2006-008, The University of Texas at Austin, Sept. 2006.Google ScholarDigital Library
- {22} H. Kim, O. Mutlu, J. Stark, and Y. N. Patt. Wish branches: Combining conditional branching and predication for adaptive predicated execution. In MICRO-38, 2005. Google ScholarDigital Library
- {23} A. Klauser, T. Austin, D. Grunwald, and B. Calder. Dynamic hammock predication for non-predicated instruction set architectures. In PACT, 1998. Google ScholarDigital Library
- {24} A. Klauser and D. Grunwald. Instruction fetch mechanisms for multipath execution processors. In MICRO-32, 1999. Google ScholarDigital Library
- {25} A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In ISCA-25, 1998. Google ScholarDigital Library
- {26} A. KleinOsowski and D. J. Lilja. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters, 1, June 2002. Google ScholarDigital Library
- {27} M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In ISCA-19, 1992. Google ScholarDigital Library
- {28} S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu. Characterizing the impact of predicated execution on branch prediction. In MICRO-27, 1994. Google ScholarDigital Library
- {29} S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In MICRO-25, 1992. Google ScholarDigital Library
- {30} O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In HPCA-9, 2003. Google ScholarDigital Library
- {31} ORC. Open research compiler for Itanium processor family. http://ipforc.sourceforge.net/.Google Scholar
- {32} J. C. H. Park and M. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Labs, Palo Alto CA, May 1991.Google Scholar
- {33} D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and dynamic branch prediction in dynamic ILP processors. In ISCA-21, 1994. Google ScholarDigital Library
- {34} E. M. Riseman and C. C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405- 1411, 1972.Google ScholarDigital Library
- {35} E. Rotenberg, Q. Jacobson, and J. E. Smith. A study of control independence in superscalar processors. In HPCA-5, 1999. Google ScholarDigital Library
- {36} E. Rotenberg and J. Smith. Control independence in trace processors. In MICRO-32, 1999. Google ScholarDigital Library
- {37} A. Seznec. Analysis of the O-GEometric History Length branch predictor. In ISCA-32, 2005. Google ScholarDigital Library
- {38} J. W. Sias, S. Ueng, G. A. Kent, I. M. Steiner, E. M. Nystrom, and W. W. Hwu. Field-testing IMPACT EPIC research results in Itanium 2. In ISCA- 31, 2004. Google ScholarDigital Library
- {39} B. Simon, B. Calder, and J. Ferrante. Incorporating predicate information into branch predictors. In HPCA-9, 2003. Google ScholarDigital Library
- {40} K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark. Branch prediction, instruction-window size, and cache size: Performance trade-offs and simulation techniques. ACM Transactions on Computer Systems, 48(11):1260-1281, Nov. 1999. Google ScholarDigital Library
- {41} E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In ISCA-29, 2002. Google ScholarDigital Library
- {42} S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. In ASPLOS-XI, 2004. Google ScholarDigital Library
- {43} J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google Scholar
- {44} G. S. Tyson. The effects of predication on branch prediction. In MICRO- 27, 1994. Google ScholarDigital Library
- {45} P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming and scheduling for dynamic execution of predicated code. In HPCA-7, 2001. Google ScholarDigital Library
- {46} N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau. Reverse if-conversion. In PLDI, 1993. Google ScholarDigital Library
Index Terms
- Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths
Recommendations
Diverge-Merge Processor: Generalized and Energy-Efficient Dynamic Predication
The branch misprediction penalty is a major performance limiter and a major cause of wasted energy in high-performance processors. The diverge-merge processor reduces this penalty by dynamically predicating a wide range of hard-to-predict branches at ...
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
CGO '07: Proceedings of the International Symposium on Code Generation and OptimizationDynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improvements ...
Processor coupling: integrating compile time and runtime scheduling for parallelism
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)The technology to implement a single-chip node composed of 4 high-performance floating-point ALUs will be available by 1995. This paper presents processor coupling, a mechanism for controlling multiple ALUs to exploit both instruction-level and inter-...
Comments