skip to main content
10.1145/2145816.2145865acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Portable parallel performance from sequential, productive, embedded domain-specific languages

Published:25 February 2012Publication History

ABSTRACT

Domain-expert productivity programmers desire scalable application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such programmers are rare, to maximize reuse of their work we propose encapsulating their strategies in mini-compilers for domain-specific embedded languages (DSELs) glued together by a common high-level host language familiar to productivity programmers. The nontrivial applications that use these DSELs perform up to 98% of peak attainable performance, and comparable to or better than existing hand-coded implementations. Our approach is unique in that each mini-compiler not only performs conventional compiler transformations and optimizations, but includes imperative procedural code that captures an efficiency expert's strategy for mapping a narrow domain onto a specific type of hardware. The result is source- and performance-portability for productivity programmers and parallel performance that rivals that of hand-coded efficiency-language implementations of the same applications. We describe a framework that supports our methodology and five implemented DSELs supporting common computation kernels.

Our results demonstrate that for several interesting classes of problems, efficiency-level parallel performance can be achieved by packaging efficiency programmers' expertise in a reusable framework that is easy to use for both productivity programmers and efficiency programmers.

References

  1. A. Buluc and J. R. Gilbert. The combinatorial BLAS: Design, implementation, and applications. Technical Report UCSB-CS-2010-18, University of California, Santa Barbara, 2010.Google ScholarGoogle Scholar
  2. H. Cook, E. Gonina, S. Kamil, G. Friedland, and D. P. A. Fox. Cuda-level performance with python-level productivity for gaussian mixture model applications. In 3rd USENIX conference on Hot topics in parallelism (HotPar'11), Berkeley, CA, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Hudak. Building domain-specific embedded languages. ACM Comput. Surv., 28:196, December 1996. ISSN 0360-0300. doi: http://doi.acm.org/10.1145/242224.242477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. SIGMOD, Jun 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Mohiyuddin, M. Hoemmen, J. Demmel, and K. Yelick. Minimizing communication in sparse matrix solvers. In Supercomputing 2009, Portland, OR, Nov 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Williams, A. Waterman, and D. A. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, pages 65--76, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Portable parallel performance from sequential, productive, embedded domain-specific languages

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
          February 2012
          352 pages
          ISBN:9781450311601
          DOI:10.1145/2145816
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 47, Issue 8
            PPOPP '12
            August 2012
            334 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2370036
            Issue’s Table of Contents

          Copyright © 2012 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader