Graph reduction algorithms for parallel evaluation of functional programs are designed. The algorithms have a garbage collection method that is on-the-fly (real time), parallel, distributed, and incremental. The algorithms automate jobs such as mapping processes to processors for global load balancing and communication locality, arranging synchronization and communication between processes. A functional language compiler is also designed that automates decomposing programs into parallel processes (extracting parallelism from programs). Because of the automation, parallel programming can be done in high level functional languages that hide machine dependent details. This makes parallel functional programming architecture independent and much easier than parallel programming in classical languages.A parallel computer is roughly designed on paper for the graph reduction algorithms. Our rough parallel computer design does graph reduction directly in tiny cells connected as a binary tree to avoid the bottleneck between processor and memory. A simulator is implemented to simulate our parallel computer design, and verify the effectiveness of our graph reduction algorithms. Simulations are done to measure number of reduction steps used, maximum number of messages passing through any cells, and number of cells needed. For simple examples like matrix addition, exponential speedups are obtained. Load balancing is implied by the exponential speedups. Communication locality is implied by the linear growth of the maximum number of messages passing through any cells and the exponential speedups. Further experimentation on more complex examples will be done in the future on parallel computers.
Index Terms
- A parallel graph reduction system
Recommendations
Productive parallel programming with CHARM++
HPC '15: Proceedings of the Symposium on High Performance ComputingCHARM++ is a general-purpose framework for developing high-performance parallel applications [1]. Applications written using Charm++ run at scales spanning mobile devices [2], multi-core processors, multi-processor NUMA woprkstations and servers, ...
A Comparison of 12 Parallel FORTRAN Dialects
A simple program that approximates pi by numerical quadrature is rewritten to run on nine commercially available processors to illustrate the compilations that arise in parallel programming in FORTRAN. The machines used are the Alliant FX/8, BBN ...
Data-Parallel Programming on MIMD Computers
The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...