skip to main content
10.1109/MICRO.2006.20acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Published:09 December 2006Publication History

ABSTRACT

This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated instructions. To achieve this without incurring large hardware cost and complexity, the compiler provides control-flow information by hints and the processor dynamically predicates instructions only on frequently executed program paths. The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered. Therefore, DMP can dynamically predicate a much larger set of branches than simple hammock branches. This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor (DMP). The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated instructions. To achieve this without incurring large hardware cost and complexity, the compiler provides control-flow information by hints and the processor dynamically predicates instructions only on frequently executed program paths. The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered. Therefore, DMP can dynamically predicate a much larger set of branches than simple hammock branches.

References

  1. {1} P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: opportunities and limits. In ICS-12, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In POPL-10, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {3} D. I. August, D. A. Connors, J. C. Gyllenhaal, and W. W. Hwu. Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results. In HPCA-3, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} D. I. August, W. W. Hwu, and S. A. Mahlke. A framework for balancing control flow and predication. In MICRO-30, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} P.-Y. Chang, E. Hao, Y. N. Patt, and P. P. Chang. Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution. In PACT, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay. High-performance throughput computing. IEEE Micro, 25(3):32-45, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} C.-Y. Cher and T. N. Vijaykumar. Skipper: a microarchitecture for exploiting control-flow independence. In MICRO-34, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} Y. Choi, A. Knies, L. Gerke, and T.-F. Ngai. The impact of if-conversion and branch prediction on program execution on the Intel Itanium processor. In MICRO-34, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In ISCA-31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} Y. Chou, J. Fung, and J. P. Shen. Reducing branch misprediction penalties via dynamic control independence detection. In ICS-13, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} J. D. Collins, D. M. Tullsen, and H. Wang. Control flow optimization via dynamic reconvergence prediction. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} A. Cristal, O. J. Santana, F. Cazorla, M. Galluzzi, T. Ramirez, M. Pericas, and M. Valero. Kilo-instruction processors: Overcoming the memory wall. IEEE Micro, 25(3):48-57, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems , 13(4):451-490, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} M. Farrens, T. Heil, J. E. Smith, and G. Tyson. Restricted dual path execution. Technical Report CSE-97-18, University of California at Davis, Nov. 1997.Google ScholarGoogle Scholar
  16. {16} A. Gandhi, H. Akkary, and S. T. Srinivasan. Reducing branch misprediction penalty via selective recovery. In HPCA-10, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} D. Grunwald, A. Klauser, S. Manne, and A. Pleszkun. Confidence estimation for speculation control. In ISCA-25, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} T. Heil and J. E. Smith. Selective dual path execution. Technical report, University of Wisconsin-Madison, Nov. 1996.Google ScholarGoogle Scholar
  19. {19} E. Jacobsen, E. Rotenberg, and J. E. Smith. Assigning confidence to conditional branch predictions. In MICRO-29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} D. A. Jiménez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA-7, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} H. Kim, J. A. Joao, O. Mutlu, and Y. N. Patt. Diverge-merge processor (DMP): Dynamic predicated execution of complex control-flow graphs based on frequently executed paths. Technical Report TR-HPS-2006-008, The University of Texas at Austin, Sept. 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} H. Kim, O. Mutlu, J. Stark, and Y. N. Patt. Wish branches: Combining conditional branching and predication for adaptive predicated execution. In MICRO-38, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} A. Klauser, T. Austin, D. Grunwald, and B. Calder. Dynamic hammock predication for non-predicated instruction set architectures. In PACT, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} A. Klauser and D. Grunwald. Instruction fetch mechanisms for multipath execution processors. In MICRO-32, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In ISCA-25, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {26} A. KleinOsowski and D. J. Lilja. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Computer Architecture Letters, 1, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {27} M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In ISCA-19, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {28} S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu. Characterizing the impact of predicated execution on branch prediction. In MICRO-27, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {29} S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In MICRO-25, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {30} O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In HPCA-9, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} ORC. Open research compiler for Itanium processor family. http://ipforc.sourceforge.net/.Google ScholarGoogle Scholar
  32. {32} J. C. H. Park and M. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Labs, Palo Alto CA, May 1991.Google ScholarGoogle Scholar
  33. {33} D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and dynamic branch prediction in dynamic ILP processors. In ISCA-21, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {34} E. M. Riseman and C. C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405- 1411, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. {35} E. Rotenberg, Q. Jacobson, and J. E. Smith. A study of control independence in superscalar processors. In HPCA-5, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {36} E. Rotenberg and J. Smith. Control independence in trace processors. In MICRO-32, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. {37} A. Seznec. Analysis of the O-GEometric History Length branch predictor. In ISCA-32, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. {38} J. W. Sias, S. Ueng, G. A. Kent, I. M. Steiner, E. M. Nystrom, and W. W. Hwu. Field-testing IMPACT EPIC research results in Itanium 2. In ISCA- 31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. {39} B. Simon, B. Calder, and J. Ferrante. Incorporating predicate information into branch predictors. In HPCA-9, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. {40} K. Skadron, P. S. Ahuja, M. Martonosi, and D. W. Clark. Branch prediction, instruction-window size, and cache size: Performance trade-offs and simulation techniques. ACM Transactions on Computer Systems, 48(11):1260-1281, Nov. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. {41} E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In ISCA-29, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. {42} S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. In ASPLOS-XI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. {43} J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google ScholarGoogle Scholar
  44. {44} G. S. Tyson. The effects of predication on branch prediction. In MICRO- 27, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. {45} P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming and scheduling for dynamic execution of predicated code. In HPCA-7, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. {46} N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau. Reverse if-conversion. In PLDI, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
              December 2006
              493 pages
              ISBN:0769527329

              Publisher

              IEEE Computer Society

              United States

              Publication History

              • Published: 9 December 2006

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              MICRO 39 Paper Acceptance Rate42of174submissions,24%Overall Acceptance Rate484of2,242submissions,22%

              Upcoming Conference

              MICRO '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader