ABSTRACT
As the use of deep neural networks continues to grow, so does the fraction of compute cycles devoted to their execution. This has led the CAD and architecture communities to devote considerable attention to building DNN hardware. Despite these efforts, the fault tolerance of DNNs has generally been overlooked. This paper is the first to conduct a large-scale, empirical study of DNN resilience. Motivated by the inherent algorithmic resilience of DNNs, we are interested in understanding the relationship between fault rate and model accuracy. To do so, we present Ares: a light-weight, DNN-specific fault injection framework validated within 12% of real hardware. We find that DNN fault tolerance varies by orders of magnitude with respect to model, layer type, and structure.
- "Solid state drive (ssd) requirements and endurance test method." https://www.jedec.org/standards-documents/focus/flash/solid-state-drives, 2017.Google Scholar
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernandez-Lobato, G.-Y. Wei, and D. Brooks, "Minerva: Enabling low-power, highly-accurate deep neural network accelerators," ISCA, 2016. Google ScholarDigital Library
- S. K. S. Hari, T. Tsai, M. Stephenson, S. W. Keckler, and J. Emer, "Sassifi: An architecture-level fault injection tool for gpu application resilience evaluation," ISPASS, 2017.Google Scholar
- P. N. Whatmough, S. K. Lee, H. Lee, S. Rama, D. Brooks, and G. Y. Wei, "A 28nm soc with a 1.2ghz 568nj/prediction sparse deep-neural-network engine with 0.1 timing error rate tolerance for iot applications," ISSCC, Feb 2017.Google Scholar
- I. Goodfellow, Y. Bengio, and A. Courville in Deep Learning, MIT Press, 2016. Google ScholarDigital Library
- P. Kerlirzin and F. Vallet, "Robustness in multilayer perceptrons," Neural Computation, 1993. Google ScholarDigital Library
- Y. L. Cun, J. S. Denker, and S. A. Solla, "Optimal brain damage," NIPS, 1990. Google ScholarDigital Library
- G. Li, S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, "Understanding error propagation in deep learning neural network (dnn) accelerators and applications," SC, 2017. Google ScholarDigital Library
- O. Temam, "A defect-tolerant accelerator for emerging high-performance applications," ISCA, June 2012. Google ScholarDigital Library
- B. Randell, P. Lee, and P. C. Treleaven, "Reliability issues in computing system design," ACM Comput. Surv., June 1978. Google ScholarDigital Library
- "Keras: The python deep learning library." http://keras.io, 2018.Google Scholar
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, "Theano: a CPU and GPU math expression compiler," SciPy, 2010.Google Scholar
- "Tensorflow: An open-source software library for machine intelligence." https://www.tensorflow.org/, 2018.Google Scholar
- Ares: a framework for quantifying the resilience of deep neural networks
Recommendations
Ares: A framework for quantifying the resilience of deep neural networks
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)As the use of deep neural networks continues to grow, so does the fraction of compute cycles devoted to their execution. This has led the CAD and architecture communities to devote considerable attention to building DNN hardware. Despite these efforts, ...
Reliability Measure of Hardware Redundancy Fault-Tolerant Digital Systems with Intermittent Faults
While significant results are available which allow estimation of reliability measure for systems with permanent faults, no generally applicable results are available for intermittent (transient) faults. Methods are presented here which allow ...
Lossless-constraint Denoising based Auto-encoders
In this paper, we address the poor generalization ability problem of traditional auto-encoder on noise data, and propose a Lossless-constraint Denoising (LD) method, which can enhance the anti-noise ability and robustness of auto-encoders. We ...
Comments