skip to main content
Parallelizing compilers: implementation and effectiveness
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Order Number:UMI Order No. GAX93-26540
Bibliometrics
Skip Abstract Section
Abstract

Effective automatic parallelization of programs requires solving two problems. First, a compiler must discover what parallelism is available in the program, and second, it must compile the available parallelism to execute efficiently on a multiprocessor. This thesis examines the first problem for scientific and engineering applications written in FORTRAN. Since most of the parallelism in such programs occurs within loops, the techniques studied are aimed at exposing loop-level parallelism.

The roles of specific optimizations and analyses for exposing parallelism are explored in the absence of machine constraints. In particular, the roles of transformations that help parallelize loops that are not trivially parallel (that is, that contain some initial loop-carried dependences) are explored. These include privatization and scalar expansion (which eliminate dependences involving scalars) and loop distribution (which breaks apart loops that contain parallel and sequential regions).

Algorithms for manipulating data used in vectorizing compilers are analyzed to demonstrate that those techniques require some rethinking in the context of parallelization. Highly flexible algorithms for implementing scalar expansion and privatization are proposed. Results generated from these algorithms show that they perform better than previous algorithms.

In addition to applying parallelization transformations, the compiler system performs a variety of analyses and transformations, ranging from interprocedural analysis to traditional scalar optimizations. Using different combinations of analyses and transformations, interactions among and effectiveness of these techniques are explored.

After investigating the parallelism that can be uncovered in a set of real programs and comparing this to what is available within the application, the reasons why these transformation techniques fail to uncover all of the parallelism are explored. This investigation leads to suggestions for future transformations and enhancements of existing transformations to expose more parallelism.

Contributors
  • Stanford University

Recommendations