ABSTRACT
Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to the accelerator---improving performance and efficiency--or run on the precise core---maintaining quality. In this paper we introduce Mithra, a co-designed hardware-software solution, that navigates these tradeoffs to deliver high performance and efficiency while lowering the final quality loss. Mithra seeks to identify whether each individual accelerator invocation will lead to an undesirable quality loss and, if so, directs the processor to run the original precise code.
This identification is cast as a binary classification task that requires a cohesive co-design of hardware and software. The hardware component performs the classification at runtime and exposes a knob to the software mechanism to control quality tradeoffs. The software tunes this knob by solving a statistical optimization problem that maximizes benefits from approximation while providing statistical guarantees that final quality level will be met with high confidence. The software uses this knob to tune and train the hardware classifiers. We devise two distinct hardware classifiers, one table-based and one neural network based. To understand the efficacy of these mechanisms, we compare them with an ideal, but infeasible design, the oracle. Results show that, with 95% confidence the table-based design can restrict the final output quality loss to 5% for 90% of unseen input sets while providing 2.5× speedup and 2.6× energy efficiency. The neural design shows similar speedup however, improves the efficiency by 13%. Compared to the table-based design, the oracle improves speedup by 26% and efficiency by 36%. These results show that Mithra performs within a close range of the oracle and can effectively navigate the quality tradeoffs in approximate acceleration.
- R. H. Dennard et al., "Design of ion-implanted mosfet's with very small physical dimensions," IEEE Journal of Solid-State Circuits, vol. 9, October 1974.Google ScholarCross Ref
- N. Hardavellas et al., "Toward dark silicon in servers," IEEE Micro, 2011. Google ScholarDigital Library
- H. Esmaeilzadeh et al., "Dark silicon and the end of multicore scaling," in ISCA, 2011. Google ScholarDigital Library
- R. Hameed et al., "Understanding sources of inefficiency in general-purpose chips," in ISCA, 2010. Google ScholarDigital Library
- S. Gupta et al., "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011. Google ScholarDigital Library
- G. Venkatesh et al., "Conservation cores: Reducing the energy of mature computations," in ASPLOS, 2010. Google ScholarDigital Library
- V. Govindaraju et al., "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011. Google ScholarDigital Library
- M. de Kruijf et al., "Relax: An architectural framework for software recovery of hardware faults," in ISCA, 2010. Google ScholarDigital Library
- H. Esmaeilzadeh et al., "Architecture support for disciplined approximate programming," in ASPLOS, 2012. Google ScholarDigital Library
- M. Samadi et al., "Sage: Self-tuning approximation for graphics engines," in MICRO, 2013. Google ScholarDigital Library
- M. Samadi et al., "Paraprox: Pattern-based approximation for data parallel applications," in ASPLOS, 2014. Google ScholarDigital Library
- W. Baek et al., "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, 2010. Google ScholarDigital Library
- S. Sidiroglou-Douskos et al., "Managing performance vs. accuracy trade-offs with loop perforation," in FSE, 2011. Google ScholarDigital Library
- S. Misailovic et al., "Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels," in OOPSLA, 2014. Google ScholarDigital Library
- J. Park et al., "AxGames: Towards crowdsourcing quality target determination in approximate computing," in ASPLOS, 2016. Google ScholarDigital Library
- H. Esmaeilzadeh et al., "Neural acceleration for general-purpose approximate programs," in MICRO, 2012. Google ScholarDigital Library
- R. S. Amant et al., "General-purpose code acceleration with limited-precision analog computation," in ISCA, 2014.Google Scholar
- A. Yazdanbakhsh et al., "Neural acceleration for gpu throughput processors," in MICRO. ACM, 2015, pp. 482--493. Google ScholarDigital Library
- S. Venkataramani et al., "Quality programmable vector processors for approximate computing," in MICRO, 2013, pp. 1--12. Google ScholarDigital Library
- Z. Du et al., "Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators," in ASP-DAC, January 2014.Google Scholar
- B. Belhadj et al., "Continuous real-world inputs can open up alternative accelerator designs," in ISCA, 2013. Google ScholarDigital Library
- B. Grigorian et al., "BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing," in HPCA, 2015.Google Scholar
- T. Moreau et al., "SNNAP: Approximate computing on programmable socs via neural acceleration," in HPCA, 2015.Google Scholar
- NetBSD Documentation, "How lazy FPU context switch works," 2011. {URL} http://www.netbsd.org/docs/kernel/lazyfpu.htmlGoogle Scholar
- C. Clopper et al., "The use of confidence or fiducial limits illustrated in the case of the binomial," Biometrika, pp. 404--413, 1934.Google ScholarCross Ref
- M. H. DeGroot, Probability and Statistics. Chapman & Hall, 1974.Google Scholar
- B.-H. Lin et al., "A fast signature computation algorithm for lfsr and misr," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 19, no. 9, pp. 1031--1040, Sep 2000. Google ScholarDigital Library
- D. E. Rumelhart et al., "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986, vol. 1. Google ScholarDigital Library
- G. Pekhimenko et al., "Base-delta-immediate compression: practical data compression for on-chip caches," in PACT, 2012. Google ScholarDigital Library
- A. Patel et al., "MARSSx86: A full system simulator for x86 CPUs," in DAC, 2011. Google ScholarDigital Library
- S. Li et al., "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009. Google ScholarDigital Library
- N. Muralimanohar et al., "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007. Google ScholarDigital Library
- S. Galal et al., "Energy-efficient floating-point unit design," IEEE TC, 2011. Google ScholarDigital Library
- A. Sampson et al., "EnerJ: Approximate data types for safe and general low-power computation," in PLDI, 2011. Google ScholarDigital Library
- M. Carbin et al., "Verifying quantitative reliability for programs that execute on unreliable hardware," in OOPSLA, 2013. Google ScholarDigital Library
- A. Sampson et al., "Expressing and verifying probabilistic assertions," in PLDI, 2014. Google ScholarDigital Library
- M. Carbin et al., "Proving acceptability properties of relaxed nondeterministic approximate programs," in PLDI, 2012. Google ScholarDigital Library
- B. Grigorian et al., "Dynamically adaptive and reliable approximate computing using light-weight error analysis," in Adaptive Hardware and Systems (AHS), 2014 NASA/ESA Conference on. IEEE, 2014, pp. 248--255.Google Scholar
- E. Schkufza et al., "Stochastic optimization of floating-point programs with tunable precision," in PLDI, 2014. Google ScholarDigital Library
- X. Sui et al., "Proactive control of approximate programs," in ASPLOS, 2016. Google ScholarDigital Library
- I. Goiri et al., "Approxhadoop: Bringing approximations to mapreduce frameworks," in ASPLOS, 2015. Google ScholarDigital Library
- D. S. Khudia et al., "Rumba: An online quality management system for approximate computing," in ISCA, Jun. 2015. Google ScholarDigital Library
- D. Mahajan et al., "Prediction-based quality control for approximate accelerators," in Workshop on Approximate Computing Across the System Stack (WACAS) in conjunction with ASPLOS, Mar. 2015.Google Scholar
Recommendations
Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration
ISCA'16Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to ...
Neural Acceleration for General-Purpose Approximate Programs
This work proposes an approximate algorithmic transformation and a new class of accelerators, called neural processing units (NPUs). NPUs leverage the approximate algorithmic transformation that converts regions of code from a Von Neumann model to a ...
Neural acceleration for GPU throughput processors
MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureGraphics Processing Units (GPUs) can accelerate diverse classes of applications, such as recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application ...
Comments