ABSTRACT
It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes.We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places}; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java™ languages.
- Sudhir Ahuja, Nicholas Carriero, and David Gelernter. Linda and friends. IEEE Computer, 19(8):26--34, August 1986. Google ScholarDigital Library
- Eric Allan, David Chase, Victor Luchangco, Jan-Willem Maessen, Sukyoung Ryu, Guy L. Steele Jr., and Sam Tobin-Hochstadt. The Fortress language specification version 0.618. Technical report, Sun Microsystems, April 2005.Google Scholar
- Yariv Aridor, Michael Factor, and Avi Teperman. cJVM: A single system image of a JVM on a cluster. In Proceedings of the International Conference on Parallel Processing (ICPP'99), pages 4--11, September 1999. Google ScholarDigital Library
- Henri E. Bal and M. Frans Kaashoek. Object distribution in Orca using compile-time and run-time techniques. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA'93), pages 162--177, November 1993. Google ScholarDigital Library
- Ray Barriuso and Allan Knies. SHMEM user's guide. Technical report, Cray Inc. Research, May 1994.Google Scholar
- John K. Bennett, John B. Carter, and Willy Zwaenepoel. Munin: Distributed shared memory based on type specific memory coherence. In Proceedings of the Symposium on Principles of Programming Languages (POPL'95), pages 168--176, March 1990. Google ScholarDigital Library
- Hans Boehm. Threads cannot be implemented as a library. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI'05), pages 261--268, June 2005. Google ScholarDigital Library
- Luca Cardelli. A language with distributed scope. In Proceedings of the Symposium on Principles of Programming Languages (POPL'95), pages 286--297, January 1995. Google ScholarDigital Library
- Calin Cascaval, Evelyn Duesterwald, Peter F. Sweeney, and Robert W. Wisniewski. Multiple page size modeling and optimization. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT'05), September 2005. Google ScholarDigital Library
- IBM International Technical Support Organization Poughkeepsie Center. Overview of lapi. Technical report sg24-2080-00, chapter 10, IBM, December 1997. www.redbooks.ibm.com/redbooks/pdfs/sg242080.pdf.Google Scholar
- Bradford L. Chamberlain, Sung-Eun Choi, Steven J. Deitz, and Lawrence Snyder. The high-level parallel language ZPL improves productivity and performance. In Proceedings of the IEEE International Workshop on Productivity and Performance in High-End Computing, 2004.Google Scholar
- Elaine Cheong, Judy Liebman, Jie Liu, and Feng Zhao. TinyGALS: A Programming model for event-driven embedded systems. In Proceedings of 2003 ACM Symposium on Applied Computing, 2003. Google ScholarDigital Library
- Brian Chin, Shane Markstrum, and Todd Millstein. Semantic type qualifiers. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI'05), pages 85--95, June 2005. Google ScholarDigital Library
- CILK-5.3 reference manual. Technical report, Supercomputing Technologies Group, June 2000.Google Scholar
- F. Darema, D.A. George, V.A. Norton, and G.F. Pfister. A Single-Program-Multiple-Data Computational model for EPEX/FORTRAN. Parallel Computing, 7(1):11--24, 1988.Google ScholarCross Ref
- Kemal Ebcioc glu, Vijay Saraswat, and Vivek Sarkar. X10: Programming for hierarchical parallelism and nonuniform data access (extended abstract). In Language Runtimes '04 Workshop: Impact of Next Generation Processor Architectures On Virtual Machines (colocated with OOPSLA 2004), October 2004. www.aurorasoft.net/workshops/lar04/lar04home.htm.Google Scholar
- Kemal Ebcioc glu, Vijay Saraswat, and Vivek Sarkar. X10: an experimental language for high productivity programming of scalable systems (extended abstract). In Workshop on Productivity and Performance in High-End Computing (P-PHEC), February 2005.Google Scholar
- ECMA. Standard ecma-334: C} language specification. http://www.ecma-international.org/publications/files/ecma-st/Ecma-334.pdf, December 2002.Google Scholar
- Tarek El-Ghazawi, William W. Carlson, and Jesse M. Draper. UPC Language Specification v1.1.1, October 2003.Google Scholar
- High Performance Fortran Forum. High performance fortran language specification version 2.0. Technical report, Rice University Houston, TX, October 1996.Google Scholar
- Ian Foster and Carl Kesselman. The Globus toolkit. The Grid: Blueprint of a New Computing Infrastructure, pages 259--278, 1998. Google ScholarDigital Library
- Basilio B. Fraquela, Jia Guo, Ganesh Bikshandi, Maria J. Garzaran, Gheorghe Almasi, Jose Moreira, and David Padua. The hierarchically tiled arrays programming approach. In Proceedings of the Workshop on Languages, Compilers, and Runtime Support for Scalable Systems (LCR'04), pages 1--12, 2004. Google ScholarDigital Library
- Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, and Vaidy Sunderam. PVM -- Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994. Google ScholarDigital Library
- James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java Language Specification. Addison Wesley, 2000. Google ScholarDigital Library
- Robert H. Halstead. MULTILISP: A language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7(4):501--538, 1985. Google ScholarDigital Library
- Per Brinch Hansen. Structured multiprogramming. CACM, 15(7), July 1972. Google ScholarDigital Library
- Timothy Harris and Keir Fraser. Language support for lightweight transactions. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'03), pages 388--402, October 2003. Google ScholarDigital Library
- Matthias Hauswirth, Peter F. Sweeney, Amer Diwan, and Michael Hind. Vertical profiling: Understanding the behavior of object oriented applications. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04), October 2004. Google ScholarDigital Library
- Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, January 1991. Google ScholarDigital Library
- Paul Hilfinger, Dan Bonachea, David Gay, Susan Graham, Ben Liblit, Geoff Pike, and Katherine Yelick. Titanium Language Reference Manual. Technical Report CSD-01-1163, University of California at Berkeley, Berkeley, Ca, USA, 2001. Google ScholarDigital Library
- C.A.R. Hoare. Monitors: An operating system structuring concept. CACM, 17(10):549--557, October 1974. Google ScholarDigital Library
- HPC challenge benchmark. http://icl.cs.utk.edu/hpcc/.Google Scholar
- HPL Workshop on High Productivity Programming Models and Languages, May 2004. http://hplws.jpl.nasa.gov/.Google Scholar
- Cray Inc. The Chapel language specification version 0.4. Technical report, Cray Inc., February 2005.Google Scholar
- The Java Grande Forum benchmark suite. http://www.epcc.ed.ac.uk/javagrande/javag.html.Google Scholar
- The Java RMI Specification. http://java.sun.com/products/jdk/rmi/.Google Scholar
- Arvind Krishnamurthy, David E. Culler, Andrea Dusseau, Seth C. Goldstein, Steven Lumetta, Thorsten von Eicken, and Katherine Yelick. Parallel programming in Split-C. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pages 262 -- 273, 1993. Google ScholarDigital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 28(9), 1979.Google ScholarDigital Library
- Doug Lea. Concurrent Programming in Java, Second Edition. Addison-Wesley, Inc., Reading, Massachusetts, 1999. Google ScholarDigital Library
- Doug Lea. The Concurreny Utilities, 2001. JSR 166, http://www.jcp.org/en/jsr/detail?id=166.Google Scholar
- Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In PODC '96: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, pages 267--275. ACM Press, 1996. Google ScholarDigital Library
- Jose Moreira, Samuel Midkiff, and Manish Gupta. A comparison of three approaches to language, compiler, and library support for multidimensional arrays in Java computing. In Proceedings of the ACM Java Grande - ISCOPE 2001 Conference, June 2001. Google ScholarDigital Library
- Jose E. Moreira, Samuel P. Midkiff, Manish Gupta, Pedro V. Artigas, Marc Snir, and Richard D. Lawrence. Java programming for high-performance numerical computing. IBM Systems Journal, 39(1):21--, 2000. Google ScholarDigital Library
- Robert W. Numrich and John Reid. Co-Array Fortran for parallel programming. ACM SIGPLAN Fortran Forum Archive, 17:1--31, August 1998. Google ScholarDigital Library
- Nathaniel Nystrom, Michael R. Clarkson, and Andrew C. Myers. Polyglot: An extensible compiler framework for Java. In Proceedings of the Conference on Compiler Construction (CC'03), pages 1380--152, April 2003. Google ScholarDigital Library
- OpenMP specifications. http://www.openmp.org/specs.Google Scholar
- Vijay Saraswat and Radha Jagadeesan. Concurrent clustered programming. In Proceedings of the International Conference on Concurrency Theory (CONCUR'05), August 2005.Google ScholarDigital Library
- Vijay Saraswat, Radha Jagadeesan, Armando Solar-Iezama, and Christoph von Praun. Determinate imperative programming: A clocked interpretetation of imperative syntax. Submitted for publication, available at http://www.saraswat.org/cf.html, September 2005.Google Scholar
- V. Sarkar and G. R. Gao. Analyzable atomic sections: Integrating fine-grained synchronization and weak consistency models for scalable parallelism. Technical report, CAPSL Technical Memo 52, February 2004.Google Scholar
- Vivek Sarkar, Clay Williams, and Kemal Ebcioc glu. Application development productivity challenges for high-end computing. In Workshop on Productivity and Performance in High-End Computing (P-PHEC), February 2004. http://www.research.ibm.com/arl/pphec/pphec2004-proceedings.pdf.Google Scholar
- Anthony Skjellum, Ewing Lusk, and William Gropp. Using MPI: Portable Parallel Programming with the Message Passing Iinterface. MIT Press, 1999. Google ScholarDigital Library
- Lorna A. Smith and J. Mark Bull. A multithreaded java grande benchmark suite. In Proceedings of the Third Workshop on Java for High Performance Computing, June 2001. Google ScholarDigital Library
- Lorna A. Smith, J. Mark Bull, and Jan Obdrzalek. A parallel Java Grande benchmark suite. In Proceedings of Supercomputing 2001, Denver, Colorado, November 2001. Google ScholarDigital Library
- Standard Performance Evaluation Corporation (SPEC). SPECjbb2000 (java business benchmark). http://www.spec.org/jbb2000.Google Scholar
- Thorsten von Eicken, David E. Culler, Seth C. Goldstein, and Klaus E. Schauser. Active messages: a mechanism for integrated communication and computation. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'92), pages 256--266, May 1992. Google ScholarDigital Library
- Robert W. Wisniewski, Peter F. Sweeney, Kartik Sudeep, Matthias Hauswirth, Evelyn Duesterwald, Calin Cascaval, and Reza Azimi. Performance and Environment Monitoring for Whole-System Characterization and Optimization. In Conference on Power/Performance interaction with Architecture,Circuits, and Compilers, 2004.Google Scholar
Index Terms
- X10: an object-oriented approach to non-uniform cluster computing
Recommendations
Parallel computing with x10
IWMSE '08: Proceedings of the 1st international workshop on Multicore software engineeringMany problems require parallel solutions and implementations and how to extract and specify parallelism has been the focus of Research during the last few decades. While there has been a significant progress in terms of (a)automatically deriving ...
X10: an object-oriented approach to non-uniform cluster computing
Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming systems languages and applicationsIt is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, ...
Local parallel iteration in x10
X10 2015: Proceedings of the ACM SIGPLAN Workshop on X10X10 programs have achieved high efficiency on petascale clusters by making significant use of parallelism between places, however, there has been less focus on exploiting local parallelism within a place. This paper introduces a standard mechanism - ...
Comments