ABSTRACT
For the past few years, OpenACC has been the primary directive-based API for programming accelerator devices like GPUs. OpenMP 4.0 is now a competitor in this space, with support from different vendors. In this paper, we describe an algorithm to convert (a subset of) OpenACC to OpenMP 4; we implemented this algorithm in a prototype tool and evaluated it by translating the EPCC Level 1 OpenACC benchmarks. We discuss some of the challenges in the conversion process and propose what parts of the process should be automated, what should be done manually by the programmer, and what future research and development is necessary in this area.
- J. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA, 2002. Google ScholarDigital Library
- EPCC OpenACC benchmark suite. https://www.epcc.ed.ac.uk/research/computing/performance-characterisation-and-benchmarking/epcc-openacc-benchmark-suite. Accessed April 29, 2016.Google Scholar
- S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. Auto-tuning a high-level language targeted to GPU codes. In Innovative Parallel Computing (InPar), 2012, pages 1--10, May 2012.Google ScholarCross Ref
- O. Hernandez, W. Ding, W. Joubert, D. Bernholdt, M. Eisenbach, and C. Kartsaklis. Porting OpenACC 2.0 to OpenMP 4.0: Key similarities and differences. http://openmpcon.org/wp-content/uploads/openmpcon2015-oscar-hernandez-portingacc.pdf. Accessed April 29, 2016.Google Scholar
- O. Hernandez, W. Ding, W. Joubert, D. Bernholdt, M. Eisenbach, and C. Kartsaklis. YouTube: Porting OpenACC 2.0 to OpenMP 4.0: Key similarities and differences. https://www.youtube.com/watch?v=CHMrcMUXuuY. Accessed April 29, 2016.Google Scholar
- D. B. Kirk and W.-m. Hwu. Programming massively parallel processors: a hands-on approach. Morgan-Kaufmann, 2012. Google ScholarDigital Library
- S. Lee and J. S. Vetter. Early evaluation of directive-based GPU programming models for productive exascale computing. In Proc. SC12, page 23. IEEE Computer Society Press, 2012. Google ScholarDigital Library
- The OpenACC application programming interface, version 2.5. http://www.openacc.org/sites/default/files/OpenACC_2pt5.pdf. Accessed June 15, 2016.Google Scholar
- OpenMP 4.0 on NVIDIA CUDA GPUs. https://parallel-computing.pro/index.php/9-cuda/43-openmp-4-0-on-nvidia-cuda-gpus. Accessed April 29, 2016.Google Scholar
- OpenMP application programming interface, version 4.5. http://www.openmp.org/mp-documents/openmp-4.5.pdf. Accessed June 15, 2016.Google Scholar
- S. Wienke, C. Terboven, J. C. Beyer, and M. S. Müller. A pattern-based comparison of OpenACC and OpenMP for accelerator computing. In Euro-Par 2014 Parallel Processing, pages 812--823. Springer, 2014.Google Scholar
- M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, Boston, MA, 1995. Google ScholarDigital Library
- R. Xu, S. Chandrasekaran, and B. Chapman. Exploring programming multi-GPUs using OpenMP and OpenACC-based hybrid model. In IPDPSW '13, pages 1169--1176. IEEE, 2013. Google ScholarDigital Library
- From OpenACC to OpenMP 4: Toward Automatic Translation
Recommendations
On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA
HPCAsia '22: International Conference on High Performance Computing in Asia-Pacific RegionPerformance Portability frameworks are becoming more central and essential in heterogeneous computing systems. However, the developer toolbox lacks the tools to assess the performance portability degree of these frameworks.
This article presents a new ...
Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud ComputingMany modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and AnalysisHybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Comments