ABSTRACT
Variation in performance and power across manufactured parts and their operating conditions is a well-known issue in advanced CMOS processes. This paper proposes a resilient HW/SW architecture for shared-L1 processor clusters to combat both static and dynamic variations. We first introduce the notion of procedure-level vulnerability (PLV) to expose fast dynamic voltage variation and its effects to the software stack for use in runtime compensation. To assess PLV, we quantify the effect of full operating conditions on the dynamic voltage variation of a post-layout processor in 45nm TSMC technology. Based on our analysis, PLV shows a range of 18mV--63mV inter-corner variation among the maximum voltage droop of procedures. To exploit this variation we propose a low-cost procedure hopping technique within the processor clusters, utilizing compile time characterized metadata related to PLV. Our results show that procedure hopping avoids critical voltage droops during the execution of all procedures while incurring less than 1% latency penalty.
- S. Ghosh, et al., "Parameter Variation Tolerance and Error Resiliency: New Design Paradigm for the Nanoscale Era," Proc. IEEE, Vol.98, No.10, pp.1718--1751, Oct. 2010.Google ScholarCross Ref
- C. Isci, et al., "An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget," Proc. MICRO, pp.347--358, 2006. Google ScholarDigital Library
- ITRS {Online}. Available: http://public.itrs.netGoogle Scholar
- A. Drake, et al., "A Distributed Critical-Path Timing Monitor for a 65nm High-Performance Microprocessor," Proc. ISSCC, pp. 398--399, 2007.Google Scholar
- S. Herbert, et al., "Exploiting Process Variability in Voltage/Frequency Control," IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 2011.Google Scholar
- C. R. Lefurgy, et al., "Active Management of Timing Guardband to Save Energy in POWER7," Proc. MICRO, 2011. Google ScholarDigital Library
- R. Teodorescu, et al., "Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing," Proc. MICRO, pp. 27--42, 2007. Google ScholarDigital Library
- A. Rahimi, et al., "Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations," Proc. DATE, pp.1102--1105, 2012.Google Scholar
- V.J. Reddi, et al., "Resilient Architectures via Collaborative Design: Maximizing Com-modity Processor Performance in the Presence of Variations," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol.30, No.10, pp.1429--1445, Oct. 2011. Google ScholarDigital Library
- S. Dighe, et al., "Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor," IEEE J. of Solid-State Circuits, Vol.46, No.1, pp. 184--193, Jan. 2011.Google ScholarCross Ref
- F. Paterna, et al., "Variability-Aware Task Allocation for Energy-Efficient Quality of Service Provisioning in Embedded Streaming Multimedia Applications," IEEE Trans. on Computers, 2011. Google ScholarDigital Library
- F. Paterna, et al., "Adaptive Idleness Distribution for Non-Uniform Aging Tolerance in MultiProcessor Systems-on-Chip," Proc. DATE, pp. 906--909, 2009. Google ScholarDigital Library
- S. Miermont, et al., "A power supply selector for energy- and area-efficient local dynamic voltage scaling," Proc. PATMOS, 2007. Google ScholarDigital Library
- A. Rahimi, et al., "History-Based Dynamic Voltage Scaling with Few Number of Voltage Modes for GALS NoC," Proc. FutureTech, 2010.Google Scholar
- A. Tiwari, et al., "Facelift: Hiding and Slowing Down Aging in Multicores," Proc. MICRO, pp.129--140, 2008. Google ScholarDigital Library
- U.R. Karpuzcu, et al., "The BubbleWrap many-core: Popping cores for sequential acceleration," Proc. MICRO, pp.447--458, 2009. Google ScholarDigital Library
- E. Grochowski, et al., "Microarchitectural Simulation and Control of di/dt-induced Power Supply Voltage Variation," Proc. HPCA, pp. 7--16, 2002. Google ScholarDigital Library
- R. Joseph, et al., "Control Techniques to Eliminate Voltage Emergencies in High-Performance Processors," Proc. HPCA, pp. 79--90, 2003. Google ScholarDigital Library
- J. Zhao, et al., "Thermal-aware voltage droop compensation for multi-core architectures," Proc. GLSVLSI, 2010. Google ScholarDigital Library
- V. Reddi, et al., "Voltage Emergency Prediction: A Signature-Based Approach To Reducing Voltage Emergencies," Proc. HPCA, pp. 18--27, 2009.Google Scholar
- K. Hazelwood, et al., "Eliminating Voltage Emergencies via Microarchitectural Voltag Control Feedback and Dynamic Optimization," Proc. ISLPED, pp. 326--331, 2004. Google ScholarDigital Library
- NVIDIA's Next Generation CUDA Compute Architecture: Fermi, Whitepaper, V1.1, 2009.Google Scholar
- D. Bortolotti, et al., "Exploring instruction caching strategies for tightly-coupled shared-memory clusters," Proc. Int. Sym. on SoC, pp. 34--41, 2011.Google Scholar
- K. Bowman, et al. "A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance," IEEE J. of Solid-State Circuits, Vol.46, No.1, pp.194--208, Jan. 2011.Google ScholarCross Ref
- A. Rahimi, et al., "A Fully-Synthesizable Single-Cycle Interconnection Network for Shared-L1 Processor Clusters," Proc. DATE, pp.1--6, 2011.Google Scholar
- E. Beigne, et al., "An Asynchronous Power Aware and Adaptive NoC Based Circuit," IEEE J. of Solid-State Circuits, Vol.44, No.4, pp.1167--1177, April 2009.Google ScholarCross Ref
- TSMC 45nm standard cell library release note, TCBN45GSBWP, version 120A, Nov. 2009.Google Scholar
- Synopsys PrimeTime® VX User Guide, June 2011.Google Scholar
- LEON3 {Online}. Available: http://www.gaisler.com/cms/Google Scholar
- EEMBC benchmark Consortium {Online}. Available: http://www.eembc.orgGoogle Scholar
Index Terms
- Procedure hopping: a low overhead solution to mitigate variability in shared-L1 processor clusters
Recommendations
Implications of fin width scaling on variability and reliability of high-k metal gate FinFETs
In this paper, we report a study to understand the fin width dependence on performance, variability and reliability of n-type and p-type triple-gate fin field effect transistors (FinFETs) with high-k dielectric and metal gate. Our results indicate that ...
Scalability and process induced variation analysis of polarity controlled silicon nanowire transistor
In this paper, we present scalability and process induced variation analysis of polarity gate silicon nanowire field-effect transistor. 3D simulation results show that the PGFET offers significant reduction in short channel effects and variability due ...
Nanometer MOSFET Effects on the Minimum-Energy Point of Sub-45nm Subthreshold Logic---Mitigation at Technology and Circuit Levels
Subthreshold operation of digital circuits enables minimum energy consumption. In this article, we observe that minimum energy Emin of subthreshold logic dramatically increases when reaching 45nm CMOS node. We demonstrate by circuit simulation and ...
Comments