skip to main content
10.1109/ISCA.2016.16acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration

Authors Info & Claims
Published:18 June 2016Publication History

ABSTRACT

Conventionally, an approximate accelerator replaces every invocation of a frequently executed region of code without considering the final quality degradation. However, there is a vast decision space in which each invocation can either be delegated to the accelerator---improving performance and efficiency--or run on the precise core---maintaining quality. In this paper we introduce Mithra, a co-designed hardware-software solution, that navigates these tradeoffs to deliver high performance and efficiency while lowering the final quality loss. Mithra seeks to identify whether each individual accelerator invocation will lead to an undesirable quality loss and, if so, directs the processor to run the original precise code.

This identification is cast as a binary classification task that requires a cohesive co-design of hardware and software. The hardware component performs the classification at runtime and exposes a knob to the software mechanism to control quality tradeoffs. The software tunes this knob by solving a statistical optimization problem that maximizes benefits from approximation while providing statistical guarantees that final quality level will be met with high confidence. The software uses this knob to tune and train the hardware classifiers. We devise two distinct hardware classifiers, one table-based and one neural network based. To understand the efficacy of these mechanisms, we compare them with an ideal, but infeasible design, the oracle. Results show that, with 95% confidence the table-based design can restrict the final output quality loss to 5% for 90% of unseen input sets while providing 2.5× speedup and 2.6× energy efficiency. The neural design shows similar speedup however, improves the efficiency by 13%. Compared to the table-based design, the oracle improves speedup by 26% and efficiency by 36%. These results show that Mithra performs within a close range of the oracle and can effectively navigate the quality tradeoffs in approximate acceleration.

References

  1. R. H. Dennard et al., "Design of ion-implanted mosfet's with very small physical dimensions," IEEE Journal of Solid-State Circuits, vol. 9, October 1974.Google ScholarGoogle ScholarCross RefCross Ref
  2. N. Hardavellas et al., "Toward dark silicon in servers," IEEE Micro, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Esmaeilzadeh et al., "Dark silicon and the end of multicore scaling," in ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Hameed et al., "Understanding sources of inefficiency in general-purpose chips," in ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Gupta et al., "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Venkatesh et al., "Conservation cores: Reducing the energy of mature computations," in ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Govindaraju et al., "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. de Kruijf et al., "Relax: An architectural framework for software recovery of hardware faults," in ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Esmaeilzadeh et al., "Architecture support for disciplined approximate programming," in ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Samadi et al., "Sage: Self-tuning approximation for graphics engines," in MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Samadi et al., "Paraprox: Pattern-based approximation for data parallel applications," in ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Baek et al., "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Sidiroglou-Douskos et al., "Managing performance vs. accuracy trade-offs with loop perforation," in FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Misailovic et al., "Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels," in OOPSLA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Park et al., "AxGames: Towards crowdsourcing quality target determination in approximate computing," in ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Esmaeilzadeh et al., "Neural acceleration for general-purpose approximate programs," in MICRO, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. S. Amant et al., "General-purpose code acceleration with limited-precision analog computation," in ISCA, 2014.Google ScholarGoogle Scholar
  18. A. Yazdanbakhsh et al., "Neural acceleration for gpu throughput processors," in MICRO. ACM, 2015, pp. 482--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Venkataramani et al., "Quality programmable vector processors for approximate computing," in MICRO, 2013, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Du et al., "Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators," in ASP-DAC, January 2014.Google ScholarGoogle Scholar
  21. B. Belhadj et al., "Continuous real-world inputs can open up alternative accelerator designs," in ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Grigorian et al., "BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing," in HPCA, 2015.Google ScholarGoogle Scholar
  23. T. Moreau et al., "SNNAP: Approximate computing on programmable socs via neural acceleration," in HPCA, 2015.Google ScholarGoogle Scholar
  24. NetBSD Documentation, "How lazy FPU context switch works," 2011. {URL} http://www.netbsd.org/docs/kernel/lazyfpu.htmlGoogle ScholarGoogle Scholar
  25. C. Clopper et al., "The use of confidence or fiducial limits illustrated in the case of the binomial," Biometrika, pp. 404--413, 1934.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. H. DeGroot, Probability and Statistics. Chapman & Hall, 1974.Google ScholarGoogle Scholar
  27. B.-H. Lin et al., "A fast signature computation algorithm for lfsr and misr," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 19, no. 9, pp. 1031--1040, Sep 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. E. Rumelhart et al., "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986, vol. 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Pekhimenko et al., "Base-delta-immediate compression: practical data compression for on-chip caches," in PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Patel et al., "MARSSx86: A full system simulator for x86 CPUs," in DAC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Li et al., "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Muralimanohar et al., "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Galal et al., "Energy-efficient floating-point unit design," IEEE TC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Sampson et al., "EnerJ: Approximate data types for safe and general low-power computation," in PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Carbin et al., "Verifying quantitative reliability for programs that execute on unreliable hardware," in OOPSLA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Sampson et al., "Expressing and verifying probabilistic assertions," in PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Carbin et al., "Proving acceptability properties of relaxed nondeterministic approximate programs," in PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B. Grigorian et al., "Dynamically adaptive and reliable approximate computing using light-weight error analysis," in Adaptive Hardware and Systems (AHS), 2014 NASA/ESA Conference on. IEEE, 2014, pp. 248--255.Google ScholarGoogle Scholar
  39. E. Schkufza et al., "Stochastic optimization of floating-point programs with tunable precision," in PLDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Sui et al., "Proactive control of approximate programs," in ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. I. Goiri et al., "Approxhadoop: Bringing approximations to mapreduce frameworks," in ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. S. Khudia et al., "Rumba: An online quality management system for approximate computing," in ISCA, Jun. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Mahajan et al., "Prediction-based quality control for approximate accelerators," in Workshop on Approximate Computing Across the System Stack (WACAS) in conjunction with ASPLOS, Mar. 2015.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader