Abstract
X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes.
We demonstrate that X10 delivers solid performance at petascale by running (weak scaling) eight application kernels on an IBM Power--775 supercomputer utilizing up to 55,680 Power7 cores (for 1.7Pflop/s of theoretical peak performance). For the four HPC Class 2 Challenge benchmarks, X10 achieves 41% to 87% of the system’s potential at scale (as measured by IBM’s HPCC Class 1 optimized runs). We also implement K-Means, Smith-Waterman, Betweenness Centrality, and Unbalanced Tree Search (UTS) for geometric trees. Our UTS implementation is the first to scale to petaflop systems.
We describe the advances in distributed termination detection, distributed load balancing, and use of high-performance interconnects that enable X10 to scale out to tens of thousands of cores. We discuss how this work is driving the evolution of the X10 language, core class libraries, and runtime systems.
- George Almási, Barnaby Dalton, Lawrence L. Hu, Franz Franchetti, Yaxun Liu, Albert Sidelnik, Thomas Spelce, Ilie Gabriel Tanase, Ettore Tiotto, Yevgen Voronenko, and Xing Xue. 2010. 2010 IBM HPC Challenge Class II Submission. http://www.hpcchallenge.org/presentations/sc2010/hpcc10_ibm.pdf.Google Scholar
- Baba Arimilli, Ravi Arimilli, Vicente Chung, Scott Clark, Wolfgang Denzel, Ben Drerup, Torsten Hoefler, Jody Joyner, Jerry Lewis, Jian Li, Nan Ni, and Ram Rajamony. 2010. The PERCS high-performance interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects (HOTI’10). IEEE Computer Society, Washington, DC, 75--82. DOI:http://dx.doi.org/10.1109/HOTI.2010.16 Google ScholarDigital Library
- Christopher Barton, Călin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, and José Nelson Amaral. 2006. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 108--117. DOI:http://dx.doi.org/10.1145/1133981.1133995 Google ScholarDigital Library
- Stephen M. Blackburn, Richard L. Hudson, Ron Morrison, J. Eliot B. Moss, David S. Munro, and John Zigman. 2001. Starting with termination: A methodology for building distributed garbage collection algorithms. In Proceedings of the 24th Australasian Conference on Computer Science (ACSC’01). IEEE Computer Society, Washington, DC, 20--28. Google ScholarDigital Library
- Dan Bonachea. 2002. GASNet Specification, v1.1. Technical Report UCB/CSD-02-1207. EECS Department, University of California, Berkeley. Google ScholarDigital Library
- Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. J. Math. Sociol. 25 (2001), 163--177.Google ScholarCross Ref
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SDM.Google Scholar
- Brad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus, Casey Battaglino, Rachel Sobel, Brandon Holt, and Jeff Keasler. 2012. Chapel HPC Challenge Entry: 2012. http://www.hpcchallenge.org/presentations/sc2012/ChapelHPCC2012.pdf.Google Scholar
- Satish Chandra, Vijay Saraswat, Vivek Sarkar, and Rastislav Bodik. 2008. Type inference for locality analysis of distributed data structures. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). ACM, New York, NY, 11--22. DOI:http://dx.doi.org/10.1145/1345206.1345211 Google ScholarDigital Library
- Silvia Crafa, David Cunningham, Vijay Saraswat, Avraham Shinnar, and Olivier Tardieu. 2014. Semantics of (Resilient) X10. In ECOOP 2014 Object-Oriented Programming, Richard Jones (Ed.). Lecture Notes in Computer Science, Vol. 8586. Springer, Berlin, 670--696. DOI:http://dx.doi.org/10.1007/978-3-662-44202-9_27Google Scholar
- Cray. 2013. Chapel Language Specification Version 0.93. http://chapel.cray.com/spec/spec-0.93.pdf.Google Scholar
- Dave Cunningham, Rajesh Bordawekar, and Vijay Saraswat. 2011. GPU programming in a high level language: Compiling X10 to CUDA. In Proceedings of the 2011 ACM SIGPLAN X10 Workshop (X10’11). ACM, New York, NY, Article 8, 10 pages. DOI:http://dx.doi.org/10.1145/2212736.2212744 Google ScholarDigital Library
- James Dinan, D. Brian Larkins, P. Sadayappan, Sriram Krishnamoorthy, and Jarek Nieplocha. 2009. Scalable work stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, New York, NY, 1--11. DOI:http://dx.doi.org/10.1145/1654059.1654113 Google ScholarDigital Library
- Jack Dongarra, Robert Graybill, William Harrod, Robert Lucas, Ewing Lusk, Piotr Luszczek, Janice Mcmahon, Allan Snavely, Jeffrey Vetter, Katherine Yelick, Sadaf Alam, Roy Campbell, Laura Carrington, Tzu-Yi Chen, Omid Khalili, Jeremy Meredith, and Mustafa Tikir. 2008. DARPA’s HPCS program: History, models, tools, languages. In Advances in COMPUTERS High Performance Computing, Marvin V. Zelkowitz (Ed.). Advances in Computers, Vol. 72. Elsevier, 1--100. DOI:http://dx.doi.org/10.1016/S0065-2458(08)00001-6Google Scholar
- Kemal Ebcioglu, Vivek Sarkar, Tarek El-Ghazawi, and John Urbanic. 2006. An experiment in measuring the productivity of three parallel programming languages. In Proceedings of the P-PHEC Workshop, held in conjunction with HPCA.Google Scholar
- David Grove, Josh Milthorpe, and Olivier Tardieu. 2014. Supporting array programming in X10. In Proceedings of the ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, Article 38, 6 pages. DOI:http://dx.doi.org/10.1145/2627373.2627380 Google ScholarDigital Library
- David Grove, Olivier Tardieu, David Cunningham, Ben Herta, Igor Peshansky, and Vijay Saraswat. 2011. A performance model for X10 applications: what’s going on under the hood? In Proceedings of the 2011 ACM SIGPLAN X10 Workshop (X10’11). ACM, New York, NY, Article 1, 8 pages. DOI:http://dx.doi.org/10.1145/2212736.2212737 Google ScholarDigital Library
- HPC Challenge. 2012. HPC Challenge Awards Competition. Retrieved from http://www.hpcchallenge.org/.Google Scholar
- HPC Challenge Benchmark Record 482. 2012. HPC Challenge Benchmark Record 482. Retrieved from http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=482.Google Scholar
- HPC Challenge Benchmark Record 495. 2012. HPC Challenge Benchmark Record 495. Retrieved from http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=495.Google Scholar
- HPC Challenge Benchmarks. 2012. HPC Challenge Benchmarks. Retrieved from http://icl.cs.utk.edu/hpcc/.Google Scholar
- Laxmikant V. Kale, Anshu Arya, Abhinav Bhatele, Abhishek Gupta, Nikhil Jain, Pritish Jetley, Jonathan Lifflander, Phil Miller, Yanhua Sun, Ramprasad Venkataramanz, Lukasz Wesolowski, and Gengbin Zheng. 2011. Charm++ for Productivity and Performance. http://www.hpcchallenge.org/presentations/sc2011/hpcc11_report_charmplusplus.pdf.Google Scholar
- Jonathan K. Lee and Jens Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10). ACM, New York, NY, 25--36. DOI:http://dx.doi.org/10.1145/1693453.1693459 Google ScholarDigital Library
- S. Lloyd. 2006. Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 2 (Sept. 2006), 129--137. DOI:http://dx.doi.org/10.1109/TIT.1982.1056489 Google ScholarDigital Library
- J. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Stat. Probab., Univ. Calif. 1965/66, 1, 281--297 (1967).Google Scholar
- John Mellor-Crummey, Laksono Adhianto, Guohua Jin, Mark Krentel, Karthik Murthy, William Scherer, and Chaoran Yang. 2011. Class II Submission to the HPC Challenge Award Competition Coarray Fortran 2.0. (Nov. 2011). http://www.hpcchallenge.org/presentations/sc2011/hpcc11_report_caf2_0.pdf.Google Scholar
- Josh Milthorpe, David Grove, Benjamin Herta, and Olivier Tardieu. 2015. Exploring the APGAS Programming Model using the LULESH Proxy Application. Technical Report RC25555. IBM Research.Google Scholar
- Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, and Mitsuhisa Sato. 2012. XcalableMP 2012 HPC Challenge Class II Submission. http://www.hpcchallenge.org/presentations/sc2012/HPCC12_XMP_slide.pdf.Google Scholar
- Martin Odersky, Lex Spoon, and Bill Venners. 2011. Programming in Scala: A Comprehensive Step-by-Step Guide, 2Nd Edition (2nd ed.). Artima Incorporation, Walnut Creek, CA. Google ScholarDigital Library
- Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2007. UTS: An unbalanced tree search benchmark. In Proceedings of the 19th international conference on Languages and compilers for parallel computing (LCPC’06). Springer-Verlag, Berlin, 235--250. http://dl.acm.org/citation.cfm?id=1757112.1757137 Google ScholarDigital Library
- Stephen Olivier and Jan Prins. 2008. Scalable dynamic load balancing using UPC. In ICPP’08: Proceedings of the 2008 37th International Conference on Parallel Processing. IEEE Computer Society, Washington, DC, 123--131. DOI:http://dx.doi.org/10.1109/ICPP.2008.19 Google ScholarDigital Library
- Parallel Programming Laboratory. 2013. The Charm++ Parallel Programming System Manual. Technical Report Version 6.4. Department of Computer Science, University of Illinois, Urbana-Champaign.Google Scholar
- Jeeva Paudel and José Nelson Amaral. 2011. Using the cowichan problems to investigate the programmability of X10 programming system. In Proceedings of the 2011 ACM SIGPLAN X10 Workshop (X10’11). ACM, New York, NY, Article 4, 10 pages. DOI:http://dx.doi.org/10.1145/2212736.2212740 Google ScholarDigital Library
- Jan Prins, Jun Huan, Bill Pugh, Chau-Wen Tseng, and P. Sadayappan. 2003. UPC Implementation of an Unbalanced Tree Search Benchmark. Technical Report 03-034. Univ. of North Carolina at Chapel Hill.Google Scholar
- Dino Quintero, Kerry Bosworth, Puneet Chaudhary, Rodrigo Garcia da Silva, ByungUn Ha, Jose Higino, Marc-Eric Kahle, Tsuyoshi Kamenoue, James Pearson, Mark Perez, Fernando Pizzano, Robert Simon, and Kai Sun. 2012. IBM Power Systems 775 for AIX and Linux HPC Solution. IBM.Google Scholar
- Ramakrishnan Rajamony, Mark W. Stephenson, and William Evan Speight. 2013. The power 775 architecture at scale. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS’13). ACM, New York, NY, 183--192. DOI:http://dx.doi.org/10.1145/2464996.2465435 Google ScholarDigital Library
- John T. Richards, Jonathan Brezin, Calvin B. Swart, and Christine A. Halverson. 2014. Productivity in parallel programming: A decade of progress. Queue 12, 9, Article 30 (Sept. 2014), 11 pages. DOI:http://dx.doi.org/10.1145/2674600.2682913 Google ScholarDigital Library
- Vijay Saraswat, Gheorghe Almasi, Ganesh Bikshandi, Calin Cascaval, David Cunningham, David Grove, Sreedhar Kodali, Igor Peshansky, and Olivier Tardieu. 2010. The asynchronous partitioned global address space model. In AMP’10: Proceedings of the 1st Workshop on Advances in Message Passing.Google Scholar
- Vijay Saraswat, Bard Bloom, Igor Peshansky, Olivier Tardieu, and David Grove. 2012. The X10 Language Specification, v2.2.3. http://x10.sourceforge.net/documentation/languagespec/x10-223.pdf.Google Scholar
- Vijay Saraswat and Radha Jagadeesan. 2005. Concurrent clustered programming. In Concur’05. 353--367. Google ScholarDigital Library
- Vijay Saraswat, Olivier Tardieu, David Grove, David Cunningham, Mikio Takeuchi, and Benjanmin Herta. 2013. A Brief Introduction to X10 (For the High Performance Programmer). Retrieved from http://x10.sourceforge.net/documentation/intro/latest/html/.Google Scholar
- Vijay A. Saraswat, Prabhanjan Kambadur, Sreedhar Kodali, David Grove, and Sriram Krishnamoorthy. 2011. Lifeline-based global load balancing. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 201--212. DOI:http://dx.doi.org/10.1145/1941553.1941582 Google ScholarDigital Library
- Amitabh B. Sinha, L. V. Kale, and B. Ramkumar. 1993. A Dynamic and Adaptive Quiescence Detection Algorithm. Technical Report 93-11. Parallel Programming Laboratory, Department of Computer Science, University of Illinois, Urbana-Champaign.Google Scholar
- T. F. Smith and M. S. Waterman. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 1 (1981), 195--197. DOI:http://dx.doi.org/10.1016/0022-2836(81)90087-5Google ScholarCross Ref
- Gabriel Tanase, Gheorghe Almási, Ettore Tiotto, Michail Alvanos, Anny Ly, and Barnaby Dalton. 2013. Performance Analysis of the IBM XL UPC on the PERCS Architecture. Technical Report RC25360. IBM Research.Google Scholar
- Olivier Tardieu, David Grove, Bard Bloom, David Cunningham, Benjamin Herta, Prabhanjan Kambadur, Vijay A. Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2012. X10 for Productivity and Performance at Scale. http://www.hpcchallenge.org/presentations/sc2012/x10-hpcc.pdf.Google Scholar
- Olivier Tardieu, Benjamin Herta, David Cunningham, David Grove, Prabhanjan Kambadur, Vijay Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2014. X10 and APGAS at petascale. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). ACM, New York, NY, 53--66. DOI:http://dx.doi.org/10.1145/2555243.2555245 Google ScholarDigital Library
- Olivier Tardieu, Haichuan Wang, and Haibo Lin. 2012. A work-stealing scheduler for X10’s task parallelism with suspension. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 267--276. DOI:http://dx.doi.org/10.1145/2145816.2145850 Google ScholarDigital Library
- Wikipedia. 2011. PERCS. Retrieved from http://en.wikipedia.org/w/index.php?title=PERCS.Google Scholar
- Chaoran Yang, Karthik Murthy, and John Mellor-Crummey. 2013. Managing asynchronous operations in coarray fortran 2.0. In Proceedings of the IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS). 1321--1332. DOI:http://dx.doi.org/10.1109/IPDPS.2013.17 Google ScholarDigital Library
- Wei Zhang, Olivier Tardieu, David Grove, Benjamin Herta, Tomio Kamada, Vijay Saraswat, and Mikio Takeuchi. 2014. GLB: Lifeline-based global load balancing library in X10. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA’14). ACM, New York, NY, 31--40. DOI:http://dx.doi.org/10.1145/2567634.2567639 Google ScholarDigital Library
Index Terms
- X10 and APGAS at Petascale
Recommendations
X10 and APGAS at Petascale
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingX10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same ...
X10 and APGAS at Petascale
PPoPP '14X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same ...
Parallel computing with x10
IWMSE '08: Proceedings of the 1st international workshop on Multicore software engineeringMany problems require parallel solutions and implementations and how to extract and specify parallelism has been the focus of Research during the last few decades. While there has been a significant progress in terms of (a)automatically deriving ...
Comments