research-article

Open Access

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

Authors:
Hashim Sharif

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Prakalp Srivastava

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Muhammad Huzaifa

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Maria Kotsifakou

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Keyur Joshi

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Yasmin Sarita

Cornell University, USA

Cornell University, USA
View Profile

,
Nathan Zhao

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Vikram S. Adve

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Sasa Misailovic

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Sarita Adve

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

Proceedings of the ACM on Programming Languages Volume 3 Issue OOPSLAArticle No.: 186pp 1–30https://doi.org/10.1145/3360612

Published:10 October 2019Publication History

Proceedings of the ACM on Programming Languages

Abstract

We propose ApproxHPVM, a compiler IR and system designed to enable accuracy-aware performance and energy tuning on heterogeneous systems with multiple compute units and approximation methods. ApproxHPVM automatically translates end-to-end application-level quality metrics into accuracy requirements for individual operations. ApproxHPVM uses a hardware-agnostic accuracy-tuning phase to do this translation that provides greater portability across heterogeneous hardware platforms and enables future capabilities like accuracy-aware dynamic scheduling and design space exploration.

ApproxHPVM incorporates three main components: (a) a compiler IR with hardware-agnostic approximation metrics, (b) a hardware-agnostic accuracy-tuning phase to identify error-tolerant computations, and (c) an accuracy-aware hardware scheduler that maps error-tolerant computations to approximate hardware components. As ApproxHPVM does not incorporate any hardware-specific knowledge as part of the IR, it can serve as a portable virtual ISA that can be shipped to all kinds of hardware platforms.

We evaluate our framework on nine benchmarks from the deep learning domain and five image processing benchmarks. Our results show that our framework can offload chunks of approximable computations to special-purpose accelerators that provide significant gains in performance and energy, while staying within user-specified application-level quality metrics with high probability. Across the 14 benchmarks, we observe from 1-9x performance speedups and 1.1-11.3x energy reduction for very small reductions in accuracy.

Supplemental Material

a186-sharif.webm

webm

124.5 MB

Download

References

Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09). ACM, New York, NY, USA, 38–49. Google ScholarDigital Library
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT ’14). ACM, New York, NY, USA, 303–316. Google ScholarDigital Library
Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and Compiler Support for Auto-tuning Variable-accuracy Algorithms. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’11). IEEE Computer Society, Washington, DC, USA, 85–96. http://dl.acm.org/citation.cfm?id=2190025.2190056Google ScholarDigital Library
Woongki Baek and Trishul M. Chilimbi. 2010. Green: A Framework for Supporting Energy-conscious Programming Using Controlled Approximation. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). ACM, New York, NY, USA, 198–209. Google ScholarDigital Library
Brett Boston, Adrian Sampson, Dan Grossman, and Luis Ceze. 2015. Probability type inference for flexible approximate programming. In OOPSLA. ACM, 470–487.Google Scholar
Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, and David Brooks. 2015. HELIX-UP: Relaxing Program Semantics to Unleash Parallelization. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’15). IEEE Computer Society, Washington, DC, USA, 235–245. http://dl.acm.org/citation.cfm?id= 2738600.2738630Google ScholarDigital Library
Michael Carbin, Sasa Misailovic, and Martin C. Rinard. 2013. Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13). ACM, New York, NY, USA, 33–52. Google ScholarDigital Library
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-end Optimizing Compiler for Deep Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, Berkeley, CA, USA, 579–594. http://dl.acm.org/citation.cfm?id=3291168.3291211Google Scholar
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 609–622. Google ScholarDigital Library
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Vol. 44. IEEE, 367–379. Google ScholarDigital Library
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. CoRR abs/1410.0759 (2014). arXiv: 1410.0759 http://arxiv.org/abs/ 1410.0759Google Scholar
Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O’Reilly, and Saman Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 379–390. Google ScholarDigital Library
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA ’15). ACM, New York, NY, USA, 92–104. Google ScholarDigital Library
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural Acceleration for General-Purpose Approximate Programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 449–460. Google ScholarDigital Library
Li Fei-Fei, Rob Fergus, and Pietro Perona. 2004. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. In 2004 Conference on Computer Vision and Pattern Recognition Workshop. 178–178. Google ScholarCross Ref
Dustin Franklin. 2018. NVIDIA Jetson TX2 Delivers Twice the Intelligence to the Edge. NVIDIA Developer Blog. (2018). https://devblogs.nvidia.com/jetson-tx2-delivers-twice-intelligence-edgeGoogle Scholar
Yonatan Geifman. 2019. VGG16 models for CIFAR-10 and CIFAR-100 using Keras. https://github.com/geifmany/cifar-vgg . (2019).Google Scholar
Inigo Goiri, Ricardo Bianchini, Santosh Nagarakatte, and Thu D Nguyen. 2015. Approxhadoop: Bringing approximations to mapreduce frameworks. In ASPLOS. ACM, 383–397.Google ScholarDigital Library
S. K. Gonugondla, M. Kang, and N. R. Shanbhag. 2018. A Variation-Tolerant In-Memory Machine Learning Classifier via OnChip Training. IEEE Journal of Solid-State Circuits 53, 11 (Nov 2018), 3163–3173. Google ScholarCross Ref
Antonio Gulli and Sujit Pal. 2017. Deep Learning with Keras. Packt Publishing.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. Google ScholarCross Ref
Nhut-Minh Ho and Weng-Fai Wong. 2017. Exploiting half precision arithmetic in Nvidia GPUs. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1–7. Google ScholarCross Ref
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic Knobs for Responsive Power-aware Computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 199–212. Google ScholarDigital Library
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). arXiv: 1704.04861 http://arxiv.org/abs/1704.04861Google Scholar
D. Anoushe Jamshidi, Mehrzad Samadi, and Scott Mahlke. 2014. D2MA: Accelerating Coarse-grained Data Transfer for GPUs. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT ’14). ACM, New York, NY, USA, 431–442. Google ScholarDigital Library
Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have Your Scratchpad and Cache It Too. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA ’15). ACM, New York, NY, USA, 707–719. Google ScholarDigital Library
Maria Kotsifakou, Prakalp Srivastava, Matthew D. Sinclair, Rakesh Komuravelli, Vikram Adve, and Sarita Adve. 2018. HPVM: Heterogeneous Parallel Virtual Machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). ACM, New York, NY, USA, 68–80. Google ScholarDigital Library
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS ’12). Curran Associates Inc., USA, 1097–1105. http://dl.acm.org/citation.cfm?id=2999134.2999257Google ScholarDigital Library
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO ’04). IEEE Computer Society, Washington, DC, USA. http://dl.acm.org/citation.cfm?id=977395.977673Google ScholarDigital Library
Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. 1989. Handwritten Digit Recognition with a Back-propagation Network. In Proceedings of the 2nd International Conference on Neural Information Processing Systems (NIPS ’89). MIT Press, Cambridge, MA, USA, 396–404. http: //dl.acm.org/citation.cfm?id=2969830.2969879Google Scholar
Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 1998. The MNIST database of handwritten digits. (1998). http://yann.lecun.com/exdb/mnistGoogle Scholar
Xiangjun Li and Jianfei Cai. 2007. Robust Transmission of JPEG2000 Encoded Images Over Packet Loss Channels. In Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007, July 2-5, 2007, Beijing, China. 947–950.Google ScholarCross Ref
Darryl D. Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy. 2016. Fixed Point Quantization of Deep Convolutional Networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML’16). JMLR.org, 2849–2858. http://dl.acm.org/citation.cfm?id=3045390.3045690Google Scholar
Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An Instruction Set Architecture for Neural Networks. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA ’16). IEEE Press, Piscataway, NJ, USA, 393–405. Google ScholarDigital Library
Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-Effort Parallel Execution Framework for Recognition and Mining Applications. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS ’09). IEEE Computer Society, Washington, DC, USA, 1–12. Google ScholarDigital Library
Jiayuan Meng, Anand Raghunathan, Srimat Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS ’10). 1–12. Google ScholarCross Ref
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed Precision Training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=r1gs9JgRZGoogle Scholar
Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. 2014. Chisel: Reliability- and Accuracyaware Optimization of Approximate Computational Kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 309–328. Google ScholarDigital Library
Sasa Misailovic, Deokhwan Kim, and Martin Rinard. 2013. Parallelizing Sequential Programs with Statistical Accuracy Tests. ACM Transactions Embedded Computing Systems (TECS) 12, Article 88 (May 2013), 26 pages. Issue 2s. Google ScholarDigital Library
Sasa Misailovic, Daniel M. Roy, and Martin C. Rinard. 2011. Probabilistically Accurate Program Transformations. In Proceedings of the 18th International Conference on Static Analysis (SAS’11). Springer-Verlag, Berlin, Heidelberg, 316–333. http://dl.acm.org/citation.cfm?id=2041552.2041576Google Scholar
Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin Rinard. 2010. Quality of Service Profiling. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE ’10). ACM, New York, NY, USA, 25–34. Google ScholarDigital Library
Sasa Misailovic, Stelios Sidiroglou, and Martin C. Rinard. 2012. Dancing with Uncertainty. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES ’12). ACM, New York, NY, USA, 51–60. Google ScholarDigital Library
NVIDIA. 2010. PTX: Parallel thread execution ISA version 2.3. NVIDIA COMPUTE Programmer’s Manual 3 (2010). http: //developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_2.3.pdfGoogle Scholar
NVIDIA. 2018. NVIDIA Jetson TX2 Developer Kit. (2018). https://www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-tx2Google Scholar
NVIDIA Developer Forums. 2018. Power Monitoring on Jetson TX2. (2018). https://devtalk.nvidia.com/default/topic/ 1000830/jetson-tx2/jetson-tx2-ina226-power-monitor-with-i2c-interfaceGoogle Scholar
Martin Rinard. 2006. Probabilistic Accuracy Bounds for Fault-tolerant Computations That Discard Tasks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS ’06). ACM, New York, NY, USA, 324–334. Google ScholarDigital Library
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Nadathur Satish, Jakob Olesen, Jongsoo Park, Artem Rakhov, and Misha Smelyanskiy. 2018. Glow: Graph Lowering Compiler Techniques for Neural Networks. CoRR abs/1805.00907 (2018). arXiv: 1805.00907 http://arxiv.org/abs/1805.00907Google Scholar
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In SC ’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–12. Google ScholarDigital Library
Charbel Sakr, Yongjune Kim, and Naresh Shanbhag. 2017. Analytical Guarantees on Numerical Precision of Deep Neural Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML ’17). 3007–3016. http://dl.acm.org/citation.cfm?id=3305890.3305992Google Scholar
Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based Approximation for Data Parallel Applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’14). ACM, New York, NY, USA, 35–50. Google ScholarDigital Library
Adrian Sampson, Andre Baixo, Benjamin Ransford, Thierry Moreau, Joshua Yip, Luis Ceze, and Mark Oskin. 2015. ACCEPT: A Programmer-Guided Compiler Framework for Practical Approximate Computing. In U. Washington, Tech. Rep. UW-CSE-15-01-01. https://dada.cs.washington.edu/research/tr/2015/01/UW-CSE-15-01-01.pdfGoogle Scholar
Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate Data Types for Safe and General Low-power Computation. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA, 164–174. Google ScholarDigital Library
Ben Sander. 2013. HSAIL: Portable compiler IR for HSA.. In Hot Chips Symposium 2013. 1–32.Google ScholarCross Ref
Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic Optimization of Floating-point Programs with Tunable Precision. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 53–64. Google ScholarDigital Library
Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). ACM, New York, NY, USA, 124–134. Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). arXiv: 1409.1556 http://arxiv.org/abs/1409.1556Google Scholar
Prakalp Srivastava, Mingu Kang, Sujan K. Gonugondla, Sungmin Lim, Jungwook Choi, Vikram Adve, Nam Sung Kim, and Naresh Shanbhag. 2018. PROMISE: An End-to-end Design of a Programmable Mixed-signal Accelerator for Machinelearning Algorithms. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA ’18). IEEE Press, Piscataway, NJ, USA, 43–56. Google ScholarDigital Library
Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. 2014. General-purpose Code Acceleration with Limited-precision Analog Computation. In Proceeding of the 41st Annual International Symposium on Computer Architecture (ISCA ’14). IEEE Press, Piscataway, NJ, USA, 505–516. http://dl.acm.org/citation.cfm?id=2665671.2665746Google ScholarCross Ref
Phillip Stanley-Marbell, Armin Alaghi, Michael Carbin, Eva Darulova, Lara Dolecek, Andreas Gerstlauer, Ghayoor Gillani, Djordje Jevdjic, Thierry Moreau, Mattia Cacciotti, Alexandros Daglis, Natalie D. Enright Jerger, Babak Falsafi, Sasa Misailovic, Adrian Sampson, and Damien Zufferey. 2018. Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms. CoRR abs/1809.05859 (2018). arXiv: 1809.05859 http://arxiv.org/abs/1809.05859Google Scholar
The XLA Team. 2019. XLA: Domain-specific compiler for linear algebra that optimizes TensorFlow computations. https: //github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/overview.md . (2019).Google Scholar
N. Thomos, N. V. Boulgouris, and M. G. Strintzis. 2006. Optimized Transmission of JPEG2000 Streams Over Wireless Channels. IEEE Transactions on Image Processing 15, 1 (January 2006).Google ScholarDigital Library
Ran Xu, Jinkyu Koo, Rakesh Kumar, Peter Bai, Subrata Mitra, Sasa Misailovic, and Saurabh Bagchi. 2018. VideoChef: Efficient Approximation for Streaming Video Processing Pipelines. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 43–56. https://www.usenix.org/conference/atc18/presentation/xu-ranGoogle Scholar
Wei Yang. 2019. Classification on CIFAR-10/100 and ImageNet with PyTorch. https://github.com/bearpaw/pytorchclassification/blob/master/models/cifar/alexnet.py . (2019).Google Scholar
Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, and Martin Rinard. 2012. Randomized Accuracy-aware Program Transformations for Efficient Approximate Computations. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12). ACM, New York, NY, USA, 441–454. Google ScholarDigital Library

Index Terms

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

ApproxTuner: a compiler and runtime system for adaptive approximations
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Manually optimizing the tradeoffs between accuracy, performance and energy for resource-intensive applications with flexible accuracy or precision requirements is extremely difficult. We present ApproxTuner, an automatic framework for accuracy-aware ...
Read More
HPVM: heterogeneous parallel virtual machine
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs. Our ...
Read More
HPVM: heterogeneous parallel virtual machine
PPoPP '18

We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs. Our ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 3, Issue OOPSLA
October 2019
2077 pages
EISSN:2475-1421
DOI:10.1145/3366395
Issue’s Table of Contents

Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2019
Published in pacmpl Volume 3, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Approximate Computing
Compiler
Deep Neural Networks
Heterogeneous Systems
Virtual ISA
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 749
  Total Downloads
- Downloads (Last 12 months)131
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

ApproxTuner: a compiler and runtime system for adaptive approximations

HPVM: heterogeneous parallel virtual machine

HPVM: heterogeneous parallel virtual machine

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

ApproxHPVM: a portable compiler IR for accuracy-aware optimizations

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

ApproxTuner: a compiler and runtime system for adaptive approximations

HPVM: heterogeneous parallel virtual machine

HPVM: heterogeneous parallel virtual machine

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media