ABSTRACT
Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overhead incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energy-efficient fault-mitigation scheme for unreliable data memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to isolate faults into bit locations with lower significance. This skews the bit-error distribution towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area, and timing overhead. Compared to error-correction codes, this can reduce the overhead by as much as 83% in read power, 77% in read access time, and 89% in area, when applied to various data mining applications in 28nm process technology.
- "ITRS - 2013 edition," 2013. {Online}. Available: http://www.itrs.netGoogle Scholar
- S. Mukhopadhyay et al., "Modeling of Failure Probability and Statistical Design of SRAM Array for Yield Enhancement in Nanoscaled CMOS," IEEE TCAD, 2005. Google ScholarDigital Library
- S. Bhunia et al., Low-Power Variation-Tolerant Design in Nanometer Silicon. Springer, 2010.Google Scholar
- Y. Emre et al., "Techniques for compensating memory errors in JPEG2000," IEEE Trans. VLSI Syst., 2013. Google ScholarDigital Library
- P. Gupta et al., "Underdesigned and opportunistic computing in presence of hardware variability," IEEE TCAD, 2013. Google ScholarDigital Library
- J. Lucas et al., "Sparkk: Quality-scalable approximate storage in DRAM," in The Memory Forum, 2014.Google Scholar
- J. Henkel et al., "Multi-layer dependability: From microarchitecture to application level," in DAC, 2014. Google ScholarDigital Library
- V. K. Chippa et al., "Analysis and characterization of inherent application resilience for approximate computing," in DAC, 2013. Google ScholarDigital Library
- A. Sampson et al., "Approximate storage in solid-state memories," in IEEE/ACM ISM, 2013. Google ScholarDigital Library
- V. Kleeberger et al., "A cross-layer technology-based study of how memory errors impact system resilience," IEEE Micro, 2013. Google ScholarDigital Library
- G. Karakonstantis et al., "On the exploitation of the inherent error resilience of wireless systems under unreliable silicon," in DAC, 2012. Google ScholarDigital Library
- I. Lee et al., "Priority based ECC for embedded SRAM memories in H.264 system," S. P. Systems, 2013. Google ScholarDigital Library
- T. Date et al., "Robust importance sampling for efficient SRAM yield analysis," in ISQED, 2010.Google Scholar
- M. Gottscho et al., "Power/capacity scaling: Energy savings with simple fault-tolerant caches," in DAC, 2014. Google ScholarDigital Library
- Z. Shi-Ting et al., "Minimizing total area of low-voltage SRAM arrays through joint optimization of cell size, redundancy, and ECC," in ICCD, 2010.Google Scholar
- F. Frustaci et al., "13.8 a 32kb sram for error-free and error-tolerant applications with dynamic energy-quality management in 28nm cmos," in ISSCC, 2014.Google Scholar
- D. Rossi et al., "Error correcting code analysis for cache memory high reliability and performance," in DATE, 2011.Google Scholar
- A. Teman et al., "Energy versus data integrity trade-offs in embedded high-density logic compatible dynamic memories," in DATE, 2015. Google ScholarDigital Library
- P. Cortez et al., "Modeling wine preferences by data mining from physicochemical properties," D. S. Systems, 2009. Google ScholarDigital Library
- I. Guyon et al., "Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark," Pattern Recogn. Lett., 2007. Google ScholarDigital Library
- P. Casale et al., "Personalization and user verification in wearable systems using biometric walking patterns," Personal and Ubiquitous Computing, 2012. Google ScholarDigital Library
- F. Pedregosa et al., "Scikit-learn: Machine learning in Python," Mach. Learning Research, 2011. Google ScholarDigital Library
Index Terms
- Mitigating the impact of faults in unreliable memories for error-resilient applications
Recommendations
Use ECP, not ECC, for hard failures in resistive memories
ISCA '10As leakage and other charge storage limitations begin to impair the scalability of DRAM, non-volatile resistive memories are being developed as a potential replacement. Unfortunately, current error correction techniques are poorly suited to this ...
Use ECP, not ECC, for hard failures in resistive memories
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureAs leakage and other charge storage limitations begin to impair the scalability of DRAM, non-volatile resistive memories are being developed as a potential replacement. Unfortunately, current error correction techniques are poorly suited to this ...
Neighbor-cell assisted error correction for MLC NAND flash memories
Performance evaluation reviewContinued scaling of NAND flash memory to smaller process technology nodes decreases its reliability, necessitating more sophisticated mechanisms to correctly read stored data values. To distinguish between different potential stored values, ...
Comments