skip to main content
10.1145/782814.782825acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

A performance analysis of the Berkeley UPC compiler

Published:23 June 2003Publication History

ABSTRACT

Unified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD) model of parallelism within a global address space. The global address space is used to simplify programming, especially on applications with irregular data structures that lead to fine-grained sharing between threads. Recent results have shown that the performance of UPC using a commercial compiler is comparable to that of MPI [7]. In this paper we describe a portable open source compiler for UPC. Our goal is to achieve a similar performance while enabling easy porting of the compiler and runtime, and also provide a framework that allows for extensive optimizations. We identify some of the challenges in compiling UPC and use a combination of micro-benchmarks and application kernels to show that our compiler has low overhead for basic operations on shared data and is competitive, and sometimes faster than, the commercial HP compiler. We also investigate several communication optimizations, and show significant benefits by hand-optimizing the generated code.

References

  1. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, and K. Yelick. An evaluation of current high-performance networks. In the 17th International Parallel and Distributed Processing Symposium (IPDPS), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. The Berkeley UPC Compiler, 2002. http://upc.lbl.gov.Google ScholarGoogle Scholar
  4. D. Bonachea. GASNet specification. Technical Report CSD-02-1207, University of California, Berkeley, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chakrabarti, M. Gupta, and J. Choi. Global communication analysis and optimization. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 68--78, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Compaq UPC version 2.0 for Tru64 UNIX. http://www.tru64unix.compaq.com/upc.Google ScholarGoogle Scholar
  7. T. El-Ghazawi and F. Cantonnet. UPC performance and potential: A NPB experimental study. In Supercomputing2002 (SC2002), Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. El-Ghazawi, W. Carlson, and J. Draper. UPC Language Specifications, version 1.1, 2003. http://www.gwu.edu/upc/documentation.html.Google ScholarGoogle Scholar
  9. T. El-Ghazawi and S. Chauvin. UPC benchmarking issues. In 30th IEEE International Conference on Parallel Processing (ICPP01), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Hilfinger et al. Titanium language reference manual. Technical Report CSD-01-1163, University of California, Berkeley, Nov. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Jorunal of Parallel and Distributed Computing, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Lee and D. Padua. Hiding relaxed memory consistency with compilers. In proceedings of The IEEE International Conference on Parallel Architectures and Compilation Techniques, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lemieux. http://www.psc.edu/machines/tcs/lemieux.html.Google ScholarGoogle Scholar
  14. B. Liblit and A. Aiken. Type systems for distributed data structures. In the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), January 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Luk and T. Mowry. Compiler-based prefetching for recursive data structures. In Architectural Support for Programming Languages and Operating Systems, pages 222--233, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matrix market. http://gams.nist.gov/MatrixMarket/.Google ScholarGoogle Scholar
  17. MPICH-A Portable Implementation of MPI. http://www-unix.mcs.anl.gov/mpi/mpich.Google ScholarGoogle Scholar
  18. R. Numwich and J. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR-1998-060, Rutherford Appleton Laboratory, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. OpenMP: Simple, Portable, Scalable SMP Programming. http://www.openmp.org.Google ScholarGoogle Scholar
  20. The Message Passing Interface (MPI) standard. http://www-unix.mcs.anl.gov/mpi/.Google ScholarGoogle Scholar
  21. R. S. Tuminaro, M. Heroux, S. A. Hutchinson, and J. N. Shadid. Official aztec user's guide version 2.1. Technical Report SAND99-8801J, Sandia National Laboratories, 1999.Google ScholarGoogle Scholar
  22. Y. Zhu and L. Hendren. Communication optimizations for parallel C programs. Journal of Parallel and Distributed Computing, 58(2):301--312, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A performance analysis of the Berkeley UPC compiler

    Recommendations

    Reviews

    Olivier Louis Marie Lecarme

    The contents of this paper are interesting, and somewhat convincing. The authors describe a compiler for the parallel language unified parallel C (UPC), using the single program multiple data (SPMD) model. They describe some aspects of the language and the model, as well as of the Berkeley UPC compiler. The rest of the paper is devoted to a comparison with the Hewlett-Packard (HP) UPC compiler. This comparison tries to demonstrate that the Berkeley compiler outperforms the HP compiler in several areas. The benchmarks used for this comparison are briefly described, and more than a dozen comparison charts present the result. This multi-author paper deserves a better presentation. As published, it contains many oversized lines, one figure is out of place, and the many comparison charts are difficult to interpret because of the different meanings of their vertical scales. If a scale means Mops/second, for example, it is better for it to be high, but if it means time in seconds, it is better for it to be low. The same problem occurs several times. It is somewhat disappointing to learn that this compiler is actually a preprocessor, since in fact, it generates C code, and relies on the performance and reliability of an available compiler. Of course, this represents an easy way to attain portability. Similarly, the use of C itself as the extended language means that the most important tool used to represent data is the ubiquitous pointer, which may not be the best solution, especially with the SPMD model. The runtime system is also of fundamental importance. In the scheme presented in this paper, it is built with two layers: the Berkeley UPC runtime system is network- and compiler-independent, while the GASNet communication system is language-independent. The Berkeley UPC compiler is not yet complete. The authors describe several optimizations they intend to implement, which will relieve the programmer from the burden of some tuning improvements. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '03: Proceedings of the 17th annual international conference on Supercomputing
      June 2003
      380 pages
      ISBN:1581137338
      DOI:10.1145/782814

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      ICS '03 Paper Acceptance Rate36of171submissions,21%Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader