Halide: decoupling algorithms from schedules for high-performance image processing

Authors:
Jonathan Ragan-Kelley

UC Berkeley

UC Berkeley
View Profile

,
Andrew Adams

Google

Google
View Profile

,
Dillon Sharlet

Google

Google
View Profile

,
Connelly Barnes

University of Virginia

University of Virginia
View Profile

,
Sylvain Paris

Adobe

Adobe
View Profile

,
Marc Levoy

Stanford University and Google

Stanford University and Google
View Profile

,
Saman Amarasinghe

MIT CSAIL

MIT CSAIL
View Profile

,
Frédo Durand

MIT CSAIL

MIT CSAIL
View Profile

Authors Info & Claims

Communications of the ACM Volume 61 Issue 1January 2018pp 106–115https://doi.org/10.1145/3150211

Published:27 December 2017Publication History

Communications of the ACM

Abstract

Writing high-performance code on modern machines requires not just locally optimizing inner loops, but globally reorganizing computations to exploit parallelism and locality---doing things such as tiling and blocking whole pipelines to fit in cache. This is especially true for image processing pipelines, where individual stages do much too little work to amortize the cost of loading and storing results to and from off-chip memory. As a result, the performance difference between a naive implementation of a pipeline and one globally optimized for parallelism and locality is often an order of magnitude. However, using existing programming tools, writing high-performance image processing code requires sacrificing simplicity, portability, and modularity. We argue that this is because traditional programming models conflate the computations defining the algorithm with decisions about intermediate storage and the order of computation, which we call the schedule.

We propose a new programming language for image processing pipelines, called Halide, that separates the algorithm from its schedule. Programmers can change the schedule to express many possible organizations of a single algorithm. The Halide compiler then synthesizes a globally combined loop nest for an entire algorithm, given a schedule. Halide models a space of schedules which is expressive enough to describe organizations that match or outperform state-of-the-art hand-written implementations of many computational photography and computer vision algorithms. Its model is simple enough to do so often in only a few lines of code, and small changes generate efficient implementations for x86, ARM, Graphics Processors (GPUs), and specialized image processors, all from a single algorithm.

Halide has been public and open source for over four years, during which it has been used by hundreds of programmers to deploy code to tens of thousands of servers and hundreds of millions of phones, processing billions of images every day.

References

Adams, A., Talvala, E., Park, S.H., Jacobs, D.E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H.P.A., Matusik, W., Pulli, K., Horowitz, M., Levoy, M. The Frankencamera: An experimental platform for computational photography. ACM Trans. Graph. 29, 4 (2010), 29:1--29:12. Google ScholarDigital Library
Aubry, M., Paris, S., Hasinoff, S.W., Kautz, J., Durand, F. Fast local Laplacian filters: Theory and applications. ACM Trans. Graph. 33, 5 (2014), 167. Google ScholarDigital Library
Bacon, D.F., Graham, S.L., Sharp, O.J. Compiler transformations for high-performance computing. ACM Comput Surv. 26, 4 (Dec. 1994). Google ScholarDigital Library
Blythe, D. The Direct3D 10 system. ACM Trans. Graph. 25, (2006), 724--734. Google ScholarDigital Library
Buck, I. GPU computing: Programming a massively parallel processor. In Proceedings of the International Symposium on Code Generation and Optimization (Tessellations Publishing, Phoenix, Arizona, 2007). Google ScholarDigital Library
Chamberlain, B., Callahan, D., Zima, H. Parallel programmability and the Chapel language. Int J High Perform Comput Appl. 21, (2007), 291--312. Google ScholarDigital Library
Chen, J., Paris, S., Durand, F. Real-time edge-aware image processing with the bilateral grid. ACM Trans. Graph. 26, 3 (2007), 103:1--103:9. Google ScholarDigital Library
Elliott, C. Functional image synthesis. In Proceedings of Bridges 2001, Mathematical Connections in Art, Music, and Science (IEEE Computer Society, Washington, DC, USA, 2001).Google Scholar
Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P. Sequoia: Programming the memory hierarchy. In ACM/IEEE conference on Supercomputing (ACM, New York, NY, 2006). Google ScholarDigital Library
Feautrier, P. Dataflow analysis of array and scalar references. Int J Parallel Program. 20, 1 (1991), 23--53.Google ScholarCross Ref
Frigo, M., Johnson, S.G. The design and implementation of FFTW3. Proc IEEE 93, 2 (2005).Google ScholarCross Ref
Gordon, M.I., Thies, W., Karczmarek, M., Lin, J., Meli, A.S., Leger, C., Lamb, A.A., Wong, J., Hoffman, H., Maze, D.Z., Amarasinghe, S. A stream compiler for communication-exposed architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems (ACM, New York, NY, 2002). Google ScholarDigital Library
Govindaraju, N., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J. High performance discrete Fourier transforms on graphics processors. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE (Washington, DC, January 2008). Google ScholarDigital Library
Halide source repository. http://github.com/halide/Halide.Google Scholar
Hasinoff, S.W., Sharlet, D., Geiss, R., Adams, A., Barron, J.T., Kainz, F., Chen, J., Levoy, M. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Trans. Graph. 35, 6 (2016). Google ScholarDigital Library
Holzmann, G. Beyond Photography: The Digital Darkroom. Prentice Hall, Englewood Cliffs, NJ, 1988. Google ScholarDigital Library
Mullapudi, R.T., Adams, A., Sharlet, D., Ragan-Kelley, J., Fatahalian, K. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4 (2016). Google ScholarDigital Library
Mullapudi, R.T., Vasista, V., Bondhugula, U. PolyMage: Automatic optimization for image processing pipelines. In ACM SIGPLAN Notices (ACM, New York, NY, 2015), volume 50, 429--443. Google ScholarDigital Library
The OpenCL specification, version 1.2. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf, 2011.Google Scholar
Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation" 93, 2 (2005), 232--275.Google ScholarCross Ref
Ragan-Kelley, J. Decoupling algorithms from the organization of computation for high performance image processing. PhD thesis, Massachusetts Institute of Technology (2014).Google Scholar
Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., Durand, F. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31, 4 (2012). Google ScholarDigital Library
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34^th ACM SIGPLAN Conference on Programming Language Design and Implementation (ACM, New York, NY, 2013). Google ScholarDigital Library
Rudy, G., Khan, M.M., Hall, M., Chen, C., Chame, J. A programming language interface to describe transformations and code generation. In Proceedings of the 23^rd International Conference on Languages and Compilers for Parallel Computing LCPC'10, (Springer-Verlag, Berlin, Heidelberg, 2011), 136--150. Google ScholarDigital Library
Suriana, P., Adams, A., Kamil, S. Parallel associative reductions in halide. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (ACM, New York, NY, 2017). Google ScholarDigital Library

Index Terms

Halide: decoupling algorithms from schedules for high-performance image processing

Recommendations

Automatically scheduling halide image processing pipelines

The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a ...
Read More
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
PLDI '13

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. ...
Read More
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 61, Issue 1
January 2018
110 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3176926
Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2017 Owner/Author
This work is licensed under a Creative Commons Attribution-NoDerivs International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 December 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 35,632
  Total Downloads
- Downloads (Last 12 months)6,556
- Downloads (Last 6 weeks)307
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format