A parallel vector processor (PVP) supercomputer relies on parallel processing and pipelines to deliver its enormous computational rate. Loop parallelization and vectorization together enable the performance of application programs to approach the theoretical peak rate of a PVP. This dissertation presents a series of powerful compiling techniques that broaden the classes of loops that can be vectorized and parallelized.
A serial loop can be executed in parallel by different processors if it does not have any loop-carried dependencies (LCD). We propose analysis and transformation techniques to detect and/or remove non-inherent LCD which include: geometric dependence testing methods to perform private array analysis, a code generation method to enable DOALL transformation of loops involving conditionally defined privatized variables, a symbolic analysis technique which in the presence of control constructs allows a compiler to remove false dependencies from irrelevant flow paths.
Loop vectorization has been found to be one of the most significant forms of parallelism. However, it has traditionally been limited to innermost loops and outer loops which are made innermost by the application of loop distribution and interchange. We propose a framework for direct vectorization of outer loops (OLV); i.e., vectorization of an outer loop without interchange or distribution. The framework includes: general vector execution modeling, legality of OLV, vector loop selection and an OLV vector code generator.
These techniques have been implemented in the Cray Fortran-90 compiling system. Implementation details are presented. The efficacy of these techniques are demonstrated by significant performance improvement of application programs.
Cited By
- Leißa R, Haffner I and Hack S Sierra Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, (17-24)
- Leißa R, Hack S and Wald I Extending a C-like language for portable SIMD programming Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (65-74)
- Leißa R, Hack S and Wald I (2012). Extending a C-like language for portable SIMD programming, ACM SIGPLAN Notices, 47:8, (65-74), Online publication date: 11-Sep-2012.
- Karrenberg R and Hack S Improving performance of OpenCL on CPUs Proceedings of the 21st international conference on Compiler Construction, (1-20)
- Karrenberg R and Hack S Whole-function vectorization Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (141-150)
- Nuzman D and Zaks A Outer-loop vectorization Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (2-11)
Index Terms
- Parallel loop transformation techniques for vector-based multiprocessor systems
Recommendations
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization
The I test is a subscript dependence test which extends both the range of applicability and the accuracy of the GCD and Banerjee tests (U. Banerjee, 1976), standard subscript dependence tests used to determine whether loops may be parallelized/...