ABSTRACT
Unified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD) model of parallelism within a global address space. The global address space is used to simplify programming, especially on applications with irregular data structures that lead to fine-grained sharing between threads. Recent results have shown that the performance of UPC using a commercial compiler is comparable to that of MPI [7]. In this paper we describe a portable open source compiler for UPC. Our goal is to achieve a similar performance while enabling easy porting of the compiler and runtime, and also provide a framework that allows for extensive optimizations. We identify some of the challenges in compiling UPC and use a combination of micro-benchmarks and application kernels to show that our compiler has low overhead for basic operations on shared data and is competitive, and sometimes faster than, the commercial HP compiler. We also investigate several communication optimizations, and show significant benefits by hand-optimizing the generated code.
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google ScholarDigital Library
- C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, and K. Yelick. An evaluation of current high-performance networks. In the 17th International Parallel and Distributed Processing Symposium (IPDPS), 2003. Google ScholarDigital Library
- The Berkeley UPC Compiler, 2002. http://upc.lbl.gov.Google Scholar
- D. Bonachea. GASNet specification. Technical Report CSD-02-1207, University of California, Berkeley, Oct. 2002. Google ScholarDigital Library
- S. Chakrabarti, M. Gupta, and J. Choi. Global communication analysis and optimization. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 68--78, 1996. Google ScholarDigital Library
- Compaq UPC version 2.0 for Tru64 UNIX. http://www.tru64unix.compaq.com/upc.Google Scholar
- T. El-Ghazawi and F. Cantonnet. UPC performance and potential: A NPB experimental study. In Supercomputing2002 (SC2002), Nov. 2002. Google ScholarDigital Library
- T. El-Ghazawi, W. Carlson, and J. Draper. UPC Language Specifications, version 1.1, 2003. http://www.gwu.edu/upc/documentation.html.Google Scholar
- T. El-Ghazawi and S. Chauvin. UPC benchmarking issues. In 30th IEEE International Conference on Parallel Processing (ICPP01), 2001. Google ScholarDigital Library
- P. Hilfinger et al. Titanium language reference manual. Technical Report CSD-01-1163, University of California, Berkeley, Nov. 2001. Google ScholarDigital Library
- A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Jorunal of Parallel and Distributed Computing, 1996. Google ScholarDigital Library
- J. Lee and D. Padua. Hiding relaxed memory consistency with compilers. In proceedings of The IEEE International Conference on Parallel Architectures and Compilation Techniques, 2001. Google ScholarDigital Library
- Lemieux. http://www.psc.edu/machines/tcs/lemieux.html.Google Scholar
- B. Liblit and A. Aiken. Type systems for distributed data structures. In the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), January 2000. Google ScholarDigital Library
- C. Luk and T. Mowry. Compiler-based prefetching for recursive data structures. In Architectural Support for Programming Languages and Operating Systems, pages 222--233, 1996. Google ScholarDigital Library
- Matrix market. http://gams.nist.gov/MatrixMarket/.Google Scholar
- MPICH-A Portable Implementation of MPI. http://www-unix.mcs.anl.gov/mpi/mpich.Google Scholar
- R. Numwich and J. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR-1998-060, Rutherford Appleton Laboratory, 1998. Google ScholarDigital Library
- OpenMP: Simple, Portable, Scalable SMP Programming. http://www.openmp.org.Google Scholar
- The Message Passing Interface (MPI) standard. http://www-unix.mcs.anl.gov/mpi/.Google Scholar
- R. S. Tuminaro, M. Heroux, S. A. Hutchinson, and J. N. Shadid. Official aztec user's guide version 2.1. Technical Report SAND99-8801J, Sandia National Laboratories, 1999.Google Scholar
- Y. Zhu and L. Hendren. Communication optimizations for parallel C programs. Journal of Parallel and Distributed Computing, 58(2):301--312, 1999. Google ScholarDigital Library
Index Terms
- A performance analysis of the Berkeley UPC compiler
Recommendations
Evaluating support for global address space languages on the Cray X1
ICS '04: Proceedings of the 18th annual international conference on SupercomputingThe Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabilities such as automatic fine-grained data parallelism through the use of ...
An evaluation of global address space languages: co-array fortran and unified parallel C
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programmingCo-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process ...
Performance of parallel bit-reversal with cilk and UPC for fast fourier transform
GPC'10: Proceedings of the 5th international conference on Advances in Grid and Pervasive ComputingBit-reversal is widely known being an important program, as essential part of Fast Fourier Transform If not carefully and well designed, it may easily take large portion of FFT application's total execution time In this paper, we present a parallel ...
Comments