skip to main content
research-article

Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages

Published:01 April 2014Publication History
Skip Abstract Section

Abstract

Developing high-performance software is a difficult task that requires the use of low-level, architecture-specific programming models (e.g., OpenMP for CMPs, CUDA for GPUs, MPI for clusters). It is typically not possible to write a single application that can run efficiently in different environments, leading to multiple versions and increased complexity. Domain-Specific Languages (DSLs) are a promising avenue to enable programmers to use high-level abstractions and still achieve good performance on a variety of hardware. This is possible because DSLs have higher-level semantics and restrictions than general-purpose languages, so DSL compilers can perform higher-level optimization and translation. However, the cost of developing performance-oriented DSLs is a substantial roadblock to their development and adoption. In this article, we present an overview of the Delite compiler framework and the DSLs that have been developed with it. Delite simplifies the process of DSL development by providing common components, like parallel patterns, optimizations, and code generators, that can be reused in DSL implementations. Delite DSLs are embedded in Scala, a general-purpose programming language, but use metaprogramming to construct an Intermediate Representation (IR) of user programs and compile to multiple languages (including C++, CUDA, and OpenCL). DSL programs are automatically parallelized and different parts of the application can run simultaneously on CPUs and GPUs. We present Delite DSLs for machine learning, data querying, graph analysis, and scientific computing and show that they all achieve performance competitive to or exceeding C++ code.

References

  1. Apache. 2014. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  2. Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A Java-compatible and synthesizable language for heterogeneous architectures. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'10). ACM Press, New York, 89--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Emil Axelsson, Koen Claessen, Mary Sheeran, Josef Svenningsson, David Engdal, and Anders Persson. 2011. The design and implementation of feldspar: An embedded language for digital signal processing. In Proceedings of the 22nd International Conference on Implementation and Application of Functional Languages (IFL'10). Springer, 121--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Martin Bravenboer, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Visser. 2008. Stratego/xt 0.17. A language and toolset for program transformation. Sci. Comput. Program. 72, 1--2, 52--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kevin J. Brown, Arvind K. Sujeeth, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2011. Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM Press, New York, 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hassan Chafi, Zach Devito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. 2010. Language virtualization for heterogeneous parallel computing (onward!). ACM SIGPLAN Not. 45, 10, 835--847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, Hyouk Joong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A domain-specific approach to heterogeneous parallelism. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A parallel dsl for image analysis and visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'12). ACM Press, New York, 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (OSDI'04). 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Zachary Devito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A domain specific language for building portable mesh-based pde solvers. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Conal Elliott, Sigbjórn Finne, and Oege de Moor. 2000. Compiling embedded languages. In Semantics, Applications, and Implementation of Program Generation, Walid Taha, Ed., Lecture Notes in Computer Science, vol. 1924, Springer, 9--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. 2011. SugarJ: Library-based language extensibility. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'11). ACM Press, 391--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun. 2012. Green-Marl: A dsl for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems Systems (ASPLOS'12). 349--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Intel. 2014. Cilk plus. http://software.intel.com/en-us/articles/intel-cilk-plus/.Google ScholarGoogle Scholar
  17. Intel. 2010. Intel array building blocks. http://software.intel.com/en-us/articles/intel-array-building-blocks.Google ScholarGoogle Scholar
  18. Intel. 2013. Intel math kernel library. http://software.intel.com/en-us/intel-mkl.Google ScholarGoogle Scholar
  19. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM Press, New York, 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Isard and Yuan Yu. 2009. Distributed data-parallel computing using a high-level programming language. In Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD'09). ACM Press, New York, 987--994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lennart C. L. Kats and Eelco Visser. 2010. The spoofax language workbench: Rules for declarative specification of languages and ides. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'10). ACM Press, New York, 444--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl Mccosh, and John Mellor-Crummey. 2005. Telescoping languages: A system for automatic generation of domain languages. Proc. IEEE 93, 3, 387--408.Google ScholarGoogle ScholarCross RefCross Ref
  23. The Khronos Group. 2014. OpenCL 1.0. http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  24. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Charles L. Lawson, Richard J. Hanson, David R. Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw. 5, 3, 308--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Daan Leijen and Erik Meijer. 1999. Domain specific embedded compilers. In Proceedings of the 2nd Conference on Domain-Specific Languages (DSL'99). ACM Press, New York, 109--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding compiled gpu functions in haskell. In Proceedings of the 3rd ACM Haskell Symposium on Haskell (Haskell'10). ACM Press, New York, 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Grzegorz Malewicz, Matthew H. Austern, Aart J. C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). ACM Press, New York, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. MathWorks. 2014. Matlab. http://www.mathworks.com/products/matlab/.Google ScholarGoogle Scholar
  30. Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. LINQ: Reconciling object, relations and xml in the .net framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Adriaan Moors, Tiark Rompf, Philipp Haller, and Martin Odersky. 2012. Scala-virtualized. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nathaniel Nystrom, Derek White, and Kishen Das. 2011. Firepile: Run-time compilation for gpus in scala. In Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE'11). ACM Press, New York, 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The pagerank citation ranking: Bringing order to the web. Tech. rep. 1999-66. Stanford Info Lab. http://ilpubs.stanford.edu:8090/422/.Google ScholarGoogle Scholar
  34. Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The java hotspot(tm) server compiler. In Proceedings of the USENIX Java Virtual Machine Research and Technology Symposium. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Markus Püschel, Jose M. F. Moura, Jeremy R. Johnson, David Padua, Manuela M. Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. 2005. SPIRAL: Code generation for dsp transforms. Proc. IEEE 93, 2, 232--275.Google ScholarGoogle ScholarCross RefCross Ref
  36. Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled dsls. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering (GPCE'10). ACM Press, New York, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin Brown, Vojin Jovanovic, Hyoukjoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing data structures in high-level programs: New directions for extensible compilers based on staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'13). 497--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tiark Rompf, Arvind K. Sujeeth, Hyoukjoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. Building-blocks for performance oriented dsls. In Proceedings of the IFIP Working Conference on Domain-Specific Languages (DSL'11).Google ScholarGoogle ScholarCross RefCross Ref
  39. Conrad Sanderson. 2006. Armadillo: An open source c++ linear algebra library for fast prototyping and computationally intensive experiments. Tech. rep., NICTA. http://arma.sourceforge.net/armadillo_nicta_2010.pdf.Google ScholarGoogle Scholar
  40. Arvind K. Sujeeth, Austin Gibbons, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Martin Odersky, and Kunle Olukotun. 2013a. Forge: Generating a high performance dsl implementation from a declarative specification. In Proceedings of the 12th International Conference on Generative Programming: Concepts &##38; Experiences (GPCE'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Arvind K. Sujeeth, Hyoukjoong Lee, Kevin J. Brown, Tiark Rompf, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An implicitly parallel domain-specific language for machine learning. In Proceedings of the 28th International Conference on Machine Learning (ICML'11).Google ScholarGoogle Scholar
  42. Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, Hyoukjoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksander Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013b. Composition and reuse with compiled domain-specific languages. In Proceedings of the European Conference on Object Oriented Programming (ECOOP'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2011. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Embedded Computing Systems
                  ACM Transactions on Embedded Computing Systems  Volume 13, Issue 4s
                  Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)
                  July 2014
                  571 pages
                  ISSN:1539-9087
                  EISSN:1558-3465
                  DOI:10.1145/2601432
                  Issue’s Table of Contents

                  Copyright © 2014 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 April 2014
                  • Accepted: 1 September 2013
                  • Revised: 1 July 2013
                  • Received: 1 February 2013
                  Published in tecs Volume 13, Issue 4s

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader