skip to main content
research-article
Public Access

X10 and APGAS at Petascale

Published:15 March 2016Publication History
Skip Abstract Section

Abstract

X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes.

We demonstrate that X10 delivers solid performance at petascale by running (weak scaling) eight application kernels on an IBM Power--775 supercomputer utilizing up to 55,680 Power7 cores (for 1.7Pflop/s of theoretical peak performance). For the four HPC Class 2 Challenge benchmarks, X10 achieves 41% to 87% of the system’s potential at scale (as measured by IBM’s HPCC Class 1 optimized runs). We also implement K-Means, Smith-Waterman, Betweenness Centrality, and Unbalanced Tree Search (UTS) for geometric trees. Our UTS implementation is the first to scale to petaflop systems.

We describe the advances in distributed termination detection, distributed load balancing, and use of high-performance interconnects that enable X10 to scale out to tens of thousands of cores. We discuss how this work is driving the evolution of the X10 language, core class libraries, and runtime systems.

References

  1. George Almási, Barnaby Dalton, Lawrence L. Hu, Franz Franchetti, Yaxun Liu, Albert Sidelnik, Thomas Spelce, Ilie Gabriel Tanase, Ettore Tiotto, Yevgen Voronenko, and Xing Xue. 2010. 2010 IBM HPC Challenge Class II Submission. http://www.hpcchallenge.org/presentations/sc2010/hpcc10_ibm.pdf.Google ScholarGoogle Scholar
  2. Baba Arimilli, Ravi Arimilli, Vicente Chung, Scott Clark, Wolfgang Denzel, Ben Drerup, Torsten Hoefler, Jody Joyner, Jerry Lewis, Jian Li, Nan Ni, and Ram Rajamony. 2010. The PERCS high-performance interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects (HOTI’10). IEEE Computer Society, Washington, DC, 75--82. DOI:http://dx.doi.org/10.1109/HOTI.2010.16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christopher Barton, Călin Casçaval, George Almási, Yili Zheng, Montse Farreras, Siddhartha Chatterje, and José Nelson Amaral. 2006. Shared memory programming for large scale machines. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 108--117. DOI:http://dx.doi.org/10.1145/1133981.1133995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stephen M. Blackburn, Richard L. Hudson, Ron Morrison, J. Eliot B. Moss, David S. Munro, and John Zigman. 2001. Starting with termination: A methodology for building distributed garbage collection algorithms. In Proceedings of the 24th Australasian Conference on Computer Science (ACSC’01). IEEE Computer Society, Washington, DC, 20--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dan Bonachea. 2002. GASNet Specification, v1.1. Technical Report UCB/CSD-02-1207. EECS Department, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. J. Math. Sociol. 25 (2001), 163--177.Google ScholarGoogle ScholarCross RefCross Ref
  7. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SDM.Google ScholarGoogle Scholar
  8. Brad Chamberlain, Sung-Eun Choi, Martha Dumler, Tom Hildebrandt, David Iten, Vass Litvinov, Greg Titus, Casey Battaglino, Rachel Sobel, Brandon Holt, and Jeff Keasler. 2012. Chapel HPC Challenge Entry: 2012. http://www.hpcchallenge.org/presentations/sc2012/ChapelHPCC2012.pdf.Google ScholarGoogle Scholar
  9. Satish Chandra, Vijay Saraswat, Vivek Sarkar, and Rastislav Bodik. 2008. Type inference for locality analysis of distributed data structures. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’08). ACM, New York, NY, 11--22. DOI:http://dx.doi.org/10.1145/1345206.1345211 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Silvia Crafa, David Cunningham, Vijay Saraswat, Avraham Shinnar, and Olivier Tardieu. 2014. Semantics of (Resilient) X10. In ECOOP 2014 Object-Oriented Programming, Richard Jones (Ed.). Lecture Notes in Computer Science, Vol. 8586. Springer, Berlin, 670--696. DOI:http://dx.doi.org/10.1007/978-3-662-44202-9_27Google ScholarGoogle Scholar
  11. Cray. 2013. Chapel Language Specification Version 0.93. http://chapel.cray.com/spec/spec-0.93.pdf.Google ScholarGoogle Scholar
  12. Dave Cunningham, Rajesh Bordawekar, and Vijay Saraswat. 2011. GPU programming in a high level language: Compiling X10 to CUDA. In Proceedings of the 2011 ACM SIGPLAN X10 Workshop (X10’11). ACM, New York, NY, Article 8, 10 pages. DOI:http://dx.doi.org/10.1145/2212736.2212744 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. James Dinan, D. Brian Larkins, P. Sadayappan, Sriram Krishnamoorthy, and Jarek Nieplocha. 2009. Scalable work stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, New York, NY, 1--11. DOI:http://dx.doi.org/10.1145/1654059.1654113 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jack Dongarra, Robert Graybill, William Harrod, Robert Lucas, Ewing Lusk, Piotr Luszczek, Janice Mcmahon, Allan Snavely, Jeffrey Vetter, Katherine Yelick, Sadaf Alam, Roy Campbell, Laura Carrington, Tzu-Yi Chen, Omid Khalili, Jeremy Meredith, and Mustafa Tikir. 2008. DARPA’s HPCS program: History, models, tools, languages. In Advances in COMPUTERS High Performance Computing, Marvin V. Zelkowitz (Ed.). Advances in Computers, Vol. 72. Elsevier, 1--100. DOI:http://dx.doi.org/10.1016/S0065-2458(08)00001-6Google ScholarGoogle Scholar
  15. Kemal Ebcioglu, Vivek Sarkar, Tarek El-Ghazawi, and John Urbanic. 2006. An experiment in measuring the productivity of three parallel programming languages. In Proceedings of the P-PHEC Workshop, held in conjunction with HPCA.Google ScholarGoogle Scholar
  16. David Grove, Josh Milthorpe, and Olivier Tardieu. 2014. Supporting array programming in X10. In Proceedings of the ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, Article 38, 6 pages. DOI:http://dx.doi.org/10.1145/2627373.2627380 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Grove, Olivier Tardieu, David Cunningham, Ben Herta, Igor Peshansky, and Vijay Saraswat. 2011. A performance model for X10 applications: what’s going on under the hood? In Proceedings of the 2011 ACM SIGPLAN X10 Workshop (X10’11). ACM, New York, NY, Article 1, 8 pages. DOI:http://dx.doi.org/10.1145/2212736.2212737 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. HPC Challenge. 2012. HPC Challenge Awards Competition. Retrieved from http://www.hpcchallenge.org/.Google ScholarGoogle Scholar
  19. HPC Challenge Benchmark Record 482. 2012. HPC Challenge Benchmark Record 482. Retrieved from http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=482.Google ScholarGoogle Scholar
  20. HPC Challenge Benchmark Record 495. 2012. HPC Challenge Benchmark Record 495. Retrieved from http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=495.Google ScholarGoogle Scholar
  21. HPC Challenge Benchmarks. 2012. HPC Challenge Benchmarks. Retrieved from http://icl.cs.utk.edu/hpcc/.Google ScholarGoogle Scholar
  22. Laxmikant V. Kale, Anshu Arya, Abhinav Bhatele, Abhishek Gupta, Nikhil Jain, Pritish Jetley, Jonathan Lifflander, Phil Miller, Yanhua Sun, Ramprasad Venkataramanz, Lukasz Wesolowski, and Gengbin Zheng. 2011. Charm++ for Productivity and Performance. http://www.hpcchallenge.org/presentations/sc2011/hpcc11_report_charmplusplus.pdf.Google ScholarGoogle Scholar
  23. Jonathan K. Lee and Jens Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10). ACM, New York, NY, 25--36. DOI:http://dx.doi.org/10.1145/1693453.1693459 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Lloyd. 2006. Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 2 (Sept. 2006), 129--137. DOI:http://dx.doi.org/10.1109/TIT.1982.1056489 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. Math. Stat. Probab., Univ. Calif. 1965/66, 1, 281--297 (1967).Google ScholarGoogle Scholar
  26. John Mellor-Crummey, Laksono Adhianto, Guohua Jin, Mark Krentel, Karthik Murthy, William Scherer, and Chaoran Yang. 2011. Class II Submission to the HPC Challenge Award Competition Coarray Fortran 2.0. (Nov. 2011). http://www.hpcchallenge.org/presentations/sc2011/hpcc11_report_caf2_0.pdf.Google ScholarGoogle Scholar
  27. Josh Milthorpe, David Grove, Benjamin Herta, and Olivier Tardieu. 2015. Exploring the APGAS Programming Model using the LULESH Proxy Application. Technical Report RC25555. IBM Research.Google ScholarGoogle Scholar
  28. Masahiro Nakao, Hitoshi Murai, Takenori Shimosaka, and Mitsuhisa Sato. 2012. XcalableMP 2012 HPC Challenge Class II Submission. http://www.hpcchallenge.org/presentations/sc2012/HPCC12_XMP_slide.pdf.Google ScholarGoogle Scholar
  29. Martin Odersky, Lex Spoon, and Bill Venners. 2011. Programming in Scala: A Comprehensive Step-by-Step Guide, 2Nd Edition (2nd ed.). Artima Incorporation, Walnut Creek, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P. Sadayappan, and Chau-Wen Tseng. 2007. UTS: An unbalanced tree search benchmark. In Proceedings of the 19th international conference on Languages and compilers for parallel computing (LCPC’06). Springer-Verlag, Berlin, 235--250. http://dl.acm.org/citation.cfm?id=1757112.1757137 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Stephen Olivier and Jan Prins. 2008. Scalable dynamic load balancing using UPC. In ICPP’08: Proceedings of the 2008 37th International Conference on Parallel Processing. IEEE Computer Society, Washington, DC, 123--131. DOI:http://dx.doi.org/10.1109/ICPP.2008.19 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Parallel Programming Laboratory. 2013. The Charm++ Parallel Programming System Manual. Technical Report Version 6.4. Department of Computer Science, University of Illinois, Urbana-Champaign.Google ScholarGoogle Scholar
  33. Jeeva Paudel and José Nelson Amaral. 2011. Using the cowichan problems to investigate the programmability of X10 programming system. In Proceedings of the 2011 ACM SIGPLAN X10 Workshop (X10’11). ACM, New York, NY, Article 4, 10 pages. DOI:http://dx.doi.org/10.1145/2212736.2212740 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jan Prins, Jun Huan, Bill Pugh, Chau-Wen Tseng, and P. Sadayappan. 2003. UPC Implementation of an Unbalanced Tree Search Benchmark. Technical Report 03-034. Univ. of North Carolina at Chapel Hill.Google ScholarGoogle Scholar
  35. Dino Quintero, Kerry Bosworth, Puneet Chaudhary, Rodrigo Garcia da Silva, ByungUn Ha, Jose Higino, Marc-Eric Kahle, Tsuyoshi Kamenoue, James Pearson, Mark Perez, Fernando Pizzano, Robert Simon, and Kai Sun. 2012. IBM Power Systems 775 for AIX and Linux HPC Solution. IBM.Google ScholarGoogle Scholar
  36. Ramakrishnan Rajamony, Mark W. Stephenson, and William Evan Speight. 2013. The power 775 architecture at scale. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS’13). ACM, New York, NY, 183--192. DOI:http://dx.doi.org/10.1145/2464996.2465435 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. John T. Richards, Jonathan Brezin, Calvin B. Swart, and Christine A. Halverson. 2014. Productivity in parallel programming: A decade of progress. Queue 12, 9, Article 30 (Sept. 2014), 11 pages. DOI:http://dx.doi.org/10.1145/2674600.2682913 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Vijay Saraswat, Gheorghe Almasi, Ganesh Bikshandi, Calin Cascaval, David Cunningham, David Grove, Sreedhar Kodali, Igor Peshansky, and Olivier Tardieu. 2010. The asynchronous partitioned global address space model. In AMP’10: Proceedings of the 1st Workshop on Advances in Message Passing.Google ScholarGoogle Scholar
  39. Vijay Saraswat, Bard Bloom, Igor Peshansky, Olivier Tardieu, and David Grove. 2012. The X10 Language Specification, v2.2.3. http://x10.sourceforge.net/documentation/languagespec/x10-223.pdf.Google ScholarGoogle Scholar
  40. Vijay Saraswat and Radha Jagadeesan. 2005. Concurrent clustered programming. In Concur’05. 353--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Vijay Saraswat, Olivier Tardieu, David Grove, David Cunningham, Mikio Takeuchi, and Benjanmin Herta. 2013. A Brief Introduction to X10 (For the High Performance Programmer). Retrieved from http://x10.sourceforge.net/documentation/intro/latest/html/.Google ScholarGoogle Scholar
  42. Vijay A. Saraswat, Prabhanjan Kambadur, Sreedhar Kodali, David Grove, and Sriram Krishnamoorthy. 2011. Lifeline-based global load balancing. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 201--212. DOI:http://dx.doi.org/10.1145/1941553.1941582 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Amitabh B. Sinha, L. V. Kale, and B. Ramkumar. 1993. A Dynamic and Adaptive Quiescence Detection Algorithm. Technical Report 93-11. Parallel Programming Laboratory, Department of Computer Science, University of Illinois, Urbana-Champaign.Google ScholarGoogle Scholar
  44. T. F. Smith and M. S. Waterman. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 1 (1981), 195--197. DOI:http://dx.doi.org/10.1016/0022-2836(81)90087-5Google ScholarGoogle ScholarCross RefCross Ref
  45. Gabriel Tanase, Gheorghe Almási, Ettore Tiotto, Michail Alvanos, Anny Ly, and Barnaby Dalton. 2013. Performance Analysis of the IBM XL UPC on the PERCS Architecture. Technical Report RC25360. IBM Research.Google ScholarGoogle Scholar
  46. Olivier Tardieu, David Grove, Bard Bloom, David Cunningham, Benjamin Herta, Prabhanjan Kambadur, Vijay A. Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2012. X10 for Productivity and Performance at Scale. http://www.hpcchallenge.org/presentations/sc2012/x10-hpcc.pdf.Google ScholarGoogle Scholar
  47. Olivier Tardieu, Benjamin Herta, David Cunningham, David Grove, Prabhanjan Kambadur, Vijay Saraswat, Avraham Shinnar, Mikio Takeuchi, and Mandana Vaziri. 2014. X10 and APGAS at petascale. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). ACM, New York, NY, 53--66. DOI:http://dx.doi.org/10.1145/2555243.2555245 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Olivier Tardieu, Haichuan Wang, and Haibo Lin. 2012. A work-stealing scheduler for X10’s task parallelism with suspension. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 267--276. DOI:http://dx.doi.org/10.1145/2145816.2145850 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wikipedia. 2011. PERCS. Retrieved from http://en.wikipedia.org/w/index.php?title=PERCS.Google ScholarGoogle Scholar
  50. Chaoran Yang, Karthik Murthy, and John Mellor-Crummey. 2013. Managing asynchronous operations in coarray fortran 2.0. In Proceedings of the IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS). 1321--1332. DOI:http://dx.doi.org/10.1109/IPDPS.2013.17 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wei Zhang, Olivier Tardieu, David Grove, Benjamin Herta, Tomio Kamada, Vijay Saraswat, and Mikio Takeuchi. 2014. GLB: Lifeline-based global load balancing library in X10. In Proceedings of the 1st Workshop on Parallel Programming for Analytics Applications (PPAA’14). ACM, New York, NY, 31--40. DOI:http://dx.doi.org/10.1145/2567634.2567639 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. X10 and APGAS at Petascale

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Parallel Computing
        ACM Transactions on Parallel Computing  Volume 2, Issue 4
        Special Issue on PPOPP 2014
        March 2016
        202 pages
        ISSN:2329-4949
        EISSN:2329-4957
        DOI:10.1145/2888415
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 March 2016
        • Accepted: 1 February 2016
        • Revised: 1 January 2016
        • Received: 1 December 2014
        Published in topc Volume 2, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader