research-article

Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration

Authors:
Divya Mahajan

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
View Profile

,
Amir Yazdanbakhsh

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
View Profile

,
Jongse Park

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
View Profile

,
Bradley Thwaites

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
View Profile

,
Hadi Esmaeilzadeh

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology

Alternative Computing Technologies (ACT) Lab, Georgia Institute of Technology
View Profile

ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitectureJune 2016Pages 66–77https://doi.org/10.1109/ISCA.2016.16

Published:18 June 2016Publication History

ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

Pages 66–77

ABSTRACT

Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to the accelerator---improving performance and efficiency--or run on the precise core---maintaining quality. In this paper we introduce Mithra, a co-designed hardware-software solution, that navigates these tradeoffs to deliver high performance and efficiency while lowering the final quality loss. Mithra seeks to identify whether each individual accelerator invocation will lead to an undesirable quality loss and, if so, directs the processor to run the original precise code.

This identification is cast as a binary classification task that requires a cohesive co-design of hardware and software. The hardware component performs the classification at runtime and exposes a knob to the software mechanism to control quality tradeoffs. The software tunes this knob by solving a statistical optimization problem that maximizes benefits from approximation while providing statistical guarantees that final quality level will be met with high confidence. The software uses this knob to tune and train the hardware classifiers. We devise two distinct hardware classifiers, one table-based and one neural network based. To understand the efficacy of these mechanisms, we compare them with an ideal, but infeasible design, the oracle. Results show that, with 95% confidence the table-based design can restrict the final output quality loss to 5% for 90% of unseen input sets while providing 2.5× speedup and 2.6× energy efficiency. The neural design shows similar speedup however, improves the efficiency by 13%. Compared to the table-based design, the oracle improves speedup by 26% and efficiency by 36%. These results show that Mithra performs within a close range of the oracle and can effectively navigate the quality tradeoffs in approximate acceleration.

References

R. H. Dennard et al., "Design of ion-implanted mosfet's with very small physical dimensions," IEEE Journal of Solid-State Circuits, vol. 9, October 1974.Google ScholarCross Ref
N. Hardavellas et al., "Toward dark silicon in servers," IEEE Micro, 2011. Google ScholarDigital Library
H. Esmaeilzadeh et al., "Dark silicon and the end of multicore scaling," in ISCA, 2011. Google ScholarDigital Library
R. Hameed et al., "Understanding sources of inefficiency in general-purpose chips," in ISCA, 2010. Google ScholarDigital Library
S. Gupta et al., "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011. Google ScholarDigital Library
G. Venkatesh et al., "Conservation cores: Reducing the energy of mature computations," in ASPLOS, 2010. Google ScholarDigital Library
V. Govindaraju et al., "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011. Google ScholarDigital Library
M. de Kruijf et al., "Relax: An architectural framework for software recovery of hardware faults," in ISCA, 2010. Google ScholarDigital Library
H. Esmaeilzadeh et al., "Architecture support for disciplined approximate programming," in ASPLOS, 2012. Google ScholarDigital Library
M. Samadi et al., "Sage: Self-tuning approximation for graphics engines," in MICRO, 2013. Google ScholarDigital Library
M. Samadi et al., "Paraprox: Pattern-based approximation for data parallel applications," in ASPLOS, 2014. Google ScholarDigital Library
W. Baek et al., "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, 2010. Google ScholarDigital Library
S. Sidiroglou-Douskos et al., "Managing performance vs. accuracy trade-offs with loop perforation," in FSE, 2011. Google ScholarDigital Library
S. Misailovic et al., "Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels," in OOPSLA, 2014. Google ScholarDigital Library
J. Park et al., "AxGames: Towards crowdsourcing quality target determination in approximate computing," in ASPLOS, 2016. Google ScholarDigital Library
H. Esmaeilzadeh et al., "Neural acceleration for general-purpose approximate programs," in MICRO, 2012. Google ScholarDigital Library
R. S. Amant et al., "General-purpose code acceleration with limited-precision analog computation," in ISCA, 2014.Google Scholar
A. Yazdanbakhsh et al., "Neural acceleration for gpu throughput processors," in MICRO. ACM, 2015, pp. 482--493. Google ScholarDigital Library
S. Venkataramani et al., "Quality programmable vector processors for approximate computing," in MICRO, 2013, pp. 1--12. Google ScholarDigital Library
Z. Du et al., "Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators," in ASP-DAC, January 2014.Google Scholar
B. Belhadj et al., "Continuous real-world inputs can open up alternative accelerator designs," in ISCA, 2013. Google ScholarDigital Library
B. Grigorian et al., "BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing," in HPCA, 2015.Google Scholar
T. Moreau et al., "SNNAP: Approximate computing on programmable socs via neural acceleration," in HPCA, 2015.Google Scholar
NetBSD Documentation, "How lazy FPU context switch works," 2011. {URL} http://www.netbsd.org/docs/kernel/lazyfpu.htmlGoogle Scholar
C. Clopper et al., "The use of confidence or fiducial limits illustrated in the case of the binomial," Biometrika, pp. 404--413, 1934.Google ScholarCross Ref
M. H. DeGroot, Probability and Statistics. Chapman & Hall, 1974.Google Scholar
B.-H. Lin et al., "A fast signature computation algorithm for lfsr and misr," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 19, no. 9, pp. 1031--1040, Sep 2000. Google ScholarDigital Library
D. E. Rumelhart et al., "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986, vol. 1. Google ScholarDigital Library
G. Pekhimenko et al., "Base-delta-immediate compression: practical data compression for on-chip caches," in PACT, 2012. Google ScholarDigital Library
A. Patel et al., "MARSSx86: A full system simulator for x86 CPUs," in DAC, 2011. Google ScholarDigital Library
S. Li et al., "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009. Google ScholarDigital Library
N. Muralimanohar et al., "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007. Google ScholarDigital Library
S. Galal et al., "Energy-efficient floating-point unit design," IEEE TC, 2011. Google ScholarDigital Library
A. Sampson et al., "EnerJ: Approximate data types for safe and general low-power computation," in PLDI, 2011. Google ScholarDigital Library
M. Carbin et al., "Verifying quantitative reliability for programs that execute on unreliable hardware," in OOPSLA, 2013. Google ScholarDigital Library
A. Sampson et al., "Expressing and verifying probabilistic assertions," in PLDI, 2014. Google ScholarDigital Library
M. Carbin et al., "Proving acceptability properties of relaxed nondeterministic approximate programs," in PLDI, 2012. Google ScholarDigital Library
B. Grigorian et al., "Dynamically adaptive and reliable approximate computing using light-weight error analysis," in Adaptive Hardware and Systems (AHS), 2014 NASA/ESA Conference on. IEEE, 2014, pp. 248--255.Google Scholar
E. Schkufza et al., "Stochastic optimization of floating-point programs with tunable precision," in PLDI, 2014. Google ScholarDigital Library
X. Sui et al., "Proactive control of approximate programs," in ASPLOS, 2016. Google ScholarDigital Library
I. Goiri et al., "Approxhadoop: Bringing approximations to mapreduce frameworks," in ASPLOS, 2015. Google ScholarDigital Library
D. S. Khudia et al., "Rumba: An online quality management system for approximate computing," in ISCA, Jun. 2015. Google ScholarDigital Library
D. Mahajan et al., "Prediction-based quality control for approximate accelerators," in Workshop on Approximate Computing Across the System Stack (WACAS) in conjunction with ASPLOS, Mar. 2015.Google Scholar

Recommendations

Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration
ISCA'16

Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to ...
Read More
Neural Acceleration for General-Purpose Approximate Programs

This work proposes an approximate algorithmic transformation and a new class of accelerators, called neural processing units (NPUs). NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a ...
Read More
Neural acceleration for GPU throughput processors
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Graphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
June 2016
756 pages
ISBN:9781467389471
General Chairs:
Sang Lyul Min
Seoul National University
,
Gabriel Loh
AMD Research
ACM SIGARCH Computer Architecture News Volume 44, Issue 3
ISCA'16
June 2016
730 pages
ISSN:0163-5964
DOI:10.1145/3007787
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
IEEE Press
Publication History
- Published: 18 June 2016
Check for updates
Author Tags
accelerators
approximate computing
quality control
statistical compiler optimization
statistical guarantees
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 333
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration

ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Recommendations

Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration

Neural Acceleration for General-Purpose Approximate Programs

Neural acceleration for GPU throughput processors