Abstract
Developing high-performance software is a difficult task that requires the use of low-level, architecture-specific programming models (e.g., OpenMP for CMPs, CUDA for GPUs, MPI for clusters). It is typically not possible to write a single application that can run efficiently in different environments, leading to multiple versions and increased complexity. Domain-Specific Languages (DSLs) are a promising avenue to enable programmers to use high-level abstractions and still achieve good performance on a variety of hardware. This is possible because DSLs have higher-level semantics and restrictions than general-purpose languages, so DSL compilers can perform higher-level optimization and translation. However, the cost of developing performance-oriented DSLs is a substantial roadblock to their development and adoption. In this article, we present an overview of the Delite compiler framework and the DSLs that have been developed with it. Delite simplifies the process of DSL development by providing common components, like parallel patterns, optimizations, and code generators, that can be reused in DSL implementations. Delite DSLs are embedded in Scala, a general-purpose programming language, but use metaprogramming to construct an Intermediate Representation (IR) of user programs and compile to multiple languages (including C++, CUDA, and OpenCL). DSL programs are automatically parallelized and different parts of the application can run simultaneously on CPUs and GPUs. We present Delite DSLs for machine learning, data querying, graph analysis, and scientific computing and show that they all achieve performance competitive to or exceeding C++ code.
- Apache. 2014. Hadoop. http://hadoop.apache.org/.Google Scholar
- Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A Java-compatible and synthesizable language for heterogeneous architectures. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'10). ACM Press, New York, 89--108. Google ScholarDigital Library
- Emil Axelsson, Koen Claessen, Mary Sheeran, Josef Svenningsson, David Engdal, and Anders Persson. 2011. The design and implementation of feldspar: An embedded language for digital signal processing. In Proceedings of the 22nd International Conference on Implementation and Application of Functional Languages (IFL'10). Springer, 121--136. Google ScholarDigital Library
- Martin Bravenboer, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Visser. 2008. Stratego/xt 0.17. A language and toolset for program transformation. Sci. Comput. Program. 72, 1--2, 52--70. Google ScholarDigital Library
- Kevin J. Brown, Arvind K. Sujeeth, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2011. Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). ACM Press, New York, 47--56. Google ScholarDigital Library
- Hassan Chafi, Zach Devito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. 2010. Language virtualization for heterogeneous parallel computing (onward!). ACM SIGPLAN Not. 45, 10, 835--847. Google ScholarDigital Library
- Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, Hyouk Joong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A domain-specific approach to heterogeneous parallelism. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'11). 35--46. Google ScholarDigital Library
- Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'10). Google ScholarDigital Library
- Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A parallel dsl for image analysis and visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'12). ACM Press, New York, 111--120. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation (OSDI'04). 137--150. Google ScholarDigital Library
- Zachary Devito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A domain specific language for building portable mesh-based pde solvers. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11). Google ScholarDigital Library
- Conal Elliott, Sigbjórn Finne, and Oege de Moor. 2000. Compiling embedded languages. In Semantics, Applications, and Implementation of Program Generation, Walid Taha, Ed., Lecture Notes in Computer Science, vol. 1924, Springer, 9--26. Google ScholarDigital Library
- Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. 2011. SugarJ: Library-based language extensibility. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'11). ACM Press, 391--406. Google ScholarDigital Library
- Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun. 2012. Green-Marl: A dsl for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems Systems (ASPLOS'12). 349--362. Google ScholarDigital Library
- Intel. 2014. Cilk plus. http://software.intel.com/en-us/articles/intel-cilk-plus/.Google Scholar
- Intel. 2010. Intel array building blocks. http://software.intel.com/en-us/articles/intel-array-building-blocks.Google Scholar
- Intel. 2013. Intel math kernel library. http://software.intel.com/en-us/intel-mkl.Google Scholar
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM Press, New York, 59--72. Google ScholarDigital Library
- Michael Isard and Yuan Yu. 2009. Distributed data-parallel computing using a high-level programming language. In Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD'09). ACM Press, New York, 987--994. Google ScholarDigital Library
- Lennart C. L. Kats and Eelco Visser. 2010. The spoofax language workbench: Rules for declarative specification of languages and ides. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA'10). ACM Press, New York, 444--463. Google ScholarDigital Library
- Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl Mccosh, and John Mellor-Crummey. 2005. Telescoping languages: A system for automatic generation of domain languages. Proc. IEEE 93, 3, 387--408.Google ScholarCross Ref
- The Khronos Group. 2014. OpenCL 1.0. http://www.khronos.org/opencl/.Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04). 75--86. Google ScholarDigital Library
- Charles L. Lawson, Richard J. Hanson, David R. Kincaid, and Fred T. Krogh. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw. 5, 3, 308--323. Google ScholarDigital Library
- Daan Leijen and Erik Meijer. 1999. Domain specific embedded compilers. In Proceedings of the 2nd Conference on Domain-Specific Languages (DSL'99). ACM Press, New York, 109--122. Google ScholarDigital Library
- Geoffrey Mainland and Greg Morrisett. 2010. Nikola: Embedding compiled gpu functions in haskell. In Proceedings of the 3rd ACM Haskell Symposium on Haskell (Haskell'10). ACM Press, New York, 67--78. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). ACM Press, New York, 135--146. Google ScholarDigital Library
- MathWorks. 2014. Matlab. http://www.mathworks.com/products/matlab/.Google Scholar
- Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. LINQ: Reconciling object, relations and xml in the .net framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). Google ScholarDigital Library
- Adriaan Moors, Tiark Rompf, Philipp Haller, and Martin Odersky. 2012. Scala-virtualized. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM'12). Google ScholarDigital Library
- Nathaniel Nystrom, Derek White, and Kishen Das. 2011. Firepile: Run-time compilation for gpus in scala. In Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering (GPCE'11). ACM Press, New York, 107--116. Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The pagerank citation ranking: Bringing order to the web. Tech. rep. 1999-66. Stanford Info Lab. http://ilpubs.stanford.edu:8090/422/.Google Scholar
- Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The java hotspot(tm) server compiler. In Proceedings of the USENIX Java Virtual Machine Research and Technology Symposium. 1--12. Google ScholarDigital Library
- Markus Püschel, Jose M. F. Moura, Jeremy R. Johnson, David Padua, Manuela M. Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. 2005. SPIRAL: Code generation for dsp transforms. Proc. IEEE 93, 2, 232--275.Google ScholarCross Ref
- Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled dsls. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering (GPCE'10). ACM Press, New York, 127--136. Google ScholarDigital Library
- Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin Brown, Vojin Jovanovic, Hyoukjoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing data structures in high-level programs: New directions for extensible compilers based on staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'13). 497--510. Google ScholarDigital Library
- Tiark Rompf, Arvind K. Sujeeth, Hyoukjoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. Building-blocks for performance oriented dsls. In Proceedings of the IFIP Working Conference on Domain-Specific Languages (DSL'11).Google ScholarCross Ref
- Conrad Sanderson. 2006. Armadillo: An open source c++ linear algebra library for fast prototyping and computationally intensive experiments. Tech. rep., NICTA. http://arma.sourceforge.net/armadillo_nicta_2010.pdf.Google Scholar
- Arvind K. Sujeeth, Austin Gibbons, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Martin Odersky, and Kunle Olukotun. 2013a. Forge: Generating a high performance dsl implementation from a declarative specification. In Proceedings of the 12th International Conference on Generative Programming: Concepts &##38; Experiences (GPCE'13). Google ScholarDigital Library
- Arvind K. Sujeeth, Hyoukjoong Lee, Kevin J. Brown, Tiark Rompf, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An implicitly parallel domain-specific language for machine learning. In Proceedings of the 28th International Conference on Machine Learning (ICML'11).Google Scholar
- Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, Hyoukjoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksander Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun. 2013b. Composition and reuse with compiled domain-specific languages. In Proceedings of the European Conference on Object Oriented Programming (ECOOP'13). Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2011. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). Google ScholarDigital Library
Index Terms
- Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages
Recommendations
Scala-virtualized
PEPM '12: Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulationScala-Virtualized extends the Scala language to better support hosting embedded DSLs. Embedding a DSL in Scala-Virtualized comes with all the benefits of a shallow embedding thanks to Scala's flexible syntax, without giving up analyzing and manipulating ...
Scala-Virtualized: linguistic reuse for deep embeddings
Scala-Virtualized extends the Scala language to better support hosting embedded DSLs. Scala is an expressive language that provides a flexible syntax, type-level computation using implicits, and other features that facilitate the development of embedded ...
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
GPCE '10: Proceedings of the ninth international conference on Generative programming and component engineeringSoftware engineering demands generality and abstraction, performance demands specialization and concretization. Generative programming can provide both, but the effort required to develop high-quality program generators likely offsets their benefits, ...
Comments