Abstract
Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache memories is impeded by its long write latency and high write power. Recent work proposed improving the write performance through relaxing the retention time of STT-RAM cells. The resultant volatile STT-RAM needs to be periodically refreshed to prevent data loss. When volatile STT-RAM is applied as the last-level cache (LLC) in chip multiprocessor (CMP) systems, frequent refresh operations could dissipate significant extra energy. In addition, refresh operations could severely conflict with normal read/write operations to degrade overall system performance. Therefore, minimizing the performance impact caused by refresh operations is crucial for the adoption of volatile STT-RAM.
In this article, we propose Cache-Coherence-Enabled Adaptive Refresh (CCear) to minimize the number of refresh operations for volatile STT-RAM, adopted as the LLC for CMP systems. Specifically, CCear interacts with cache coherence protocol and cache management policy to minimize the number of refresh operations on volatile STT-RAM caches. Full-system simulation results show that CCear performs close to an ideal refresh policy with low overhead. Compared with state-of-the-art refresh policies, CCear simultaneously improves the system performance and reduces the energy consumption. Moreover, the performance of CCear could be further enhanced using small filter caches to accommodate the not-refreshed private STT-RAM blocks.
- Barth, J., Reohr, W. R., Parries, P., Fredeman, G., Golz, J., Schuster, S. E., Matick, R. E., Hunter, H., Tanner, C. C., Harig, J., Kim, H., Khan, B. A., Griesemer, J., Havreluk, R. P., Yanagisawa, K., Kirihata, T., and Iyer, S. S. 2008. A 500 MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE J. Solid-State Circ. 43, 1, 86--95.Google ScholarCross Ref
- Bienia, C. 2011. Benchmarking modern multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ. Google ScholarDigital Library
- Chen, E., Apalkov, D., Diao, Z., Driskill-Smith, A., Druist, D., Lottis, D., Nikitin, V., Tang, X., Watts, S., Wang, S., Wolf, S. A., Ghosh, A. W., Lu, J. W., Poon, S. J., Stan, M., Butler, W. H., Gupta, S., Mewes, C., Mewes, T., and Visscher, P. B. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Tran. Magnet. 46, 6, 1873--1878.Google ScholarCross Ref
- Chen, Y.-T., Cong, J., Huang, H., Liu, B., Liu, C., Potkonjak, M., and Reinman, G. 2012. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 45--50. Google ScholarDigital Library
- Dally, W. J. and Towles, B. 2001. Route packets, not wires: On-chip inteconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 684--689. Google ScholarDigital Library
- Dong, X., Wu, X., Sun, G., Xie, Y., Li, H., and Chen, Y. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 554--559. Google ScholarDigital Library
- Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of 29th Annual International Symposium on Computer Architecture (ISCA'02). 148--157. Google ScholarDigital Library
- Ghosh, M. and Lee, H.-H. S. 2007. Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'40). 134--145. Google ScholarDigital Library
- Hosomi, M., Yamagishi, H., Yamamoto, T., Bessho, K., Higo, Y., Yamane, K., Yamada, H., Shoji, M., Hachino, H., Fukumoto, C., Nagao, H., and Kano, H. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the IEEE International Electron Devices Meeting (IEDM'05). 459--462.Google Scholar
- Hu, Z., Kaxiras, S., and Martonosi, M. 2002. Let caches decay: Reducing leakage energy via exploitation of cache generational behavior. ACM Trans. Comput. Syst. 20, 2, 161--190. Google ScholarDigital Library
- Jadidi, A., Arjomand, M., and Sarbazi-Azad, H. 2011. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design (ISLPED'11). 79--84. Google ScholarDigital Library
- Jaleel, A., Theobald, K. B., Steely, S. C. Jr., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). 60--71. Google ScholarDigital Library
- Jog, A., Mishra, A. K., Xu, C., Xie, Y., Narayanan, V., Iyer, R., and Das, C. R. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). 243--252. Google ScholarDigital Library
- Kahng, A. B., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 423--428. Google ScholarDigital Library
- Kalla, R., Sinharoy, B., Starke, W. J., and Floyd, M. 2010. Power7: IBM's next-generation server processor. IEEE Micro 30, 2, 7--15. Google ScholarDigital Library
- Khan, S. M., Jiménez, D. A., Burger, D., and Falsafi, B. 2010a. Using dead blocks as a virtual victim cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 489--500. Google ScholarDigital Library
- Khan, S. M., Tian, Y., and Jimenez, D. A. 2010b. Sampling dead block prediction for last-level caches. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 175--186. Google ScholarDigital Library
- Kim, N. S., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. Computer 36, 12, 68--75. Google ScholarDigital Library
- Kin, J., Gupta, M., and Mangione-Smith, W. H. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (MICRO'97). 184--193. Google ScholarDigital Library
- Li, J., Shi, L., Xue, C. J., Yang, C., and Xu, Y. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia). 19--28.Google Scholar
- Li, Q., Li, J., Shi, L., Xue, C. J., and He, Y. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). 351--356. Google ScholarDigital Library
- Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'41). 222--233. Google ScholarDigital Library
- Liu, J., Jaiyen, B., Veras, R., and Mutlu, O. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA'12). 1--12. Google ScholarDigital Library
- Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58. Google ScholarDigital Library
- Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Architect. News 33, 4, 92--99. Google ScholarDigital Library
- Meng, Y., Sherwood, T., and Kastner, R. 2005. On the limits of leakage power reduction in caches. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). 154--165. Google ScholarDigital Library
- Micron Technology. 2007. Calculating Memory System Power for DDR3. 2007. http://download.micron.com/pdf/technotes/ddr3/TN41_01DDR3Power.pdf.Google Scholar
- Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 3--14. http://www.hpl.hp.com/research/cacti/. Google ScholarDigital Library
- Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely S. C. Jr., and Emer, J. 2008. Set-dueling-controlled adaptive insertion for high-performance caching. IEEE Micro 28, 1, 91--98. Google ScholarDigital Library
- Rasquinha, M., Choudhary, D., Chatterjee, S., Mukhopadhyay, S., and Yalamanchili, S. 2010. An energy efficient cache design using spin torque transfer (STT) RAM. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'10). 389--394. Google ScholarDigital Library
- Smullen, C. W., Mohan, V., Nigam, A., Gurumurthi, S., and Stan, M. R. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA'11). 50--61. Google ScholarDigital Library
- Sorin, D. J., Hill, M. D., and Wood, D. A. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan and Claypool. Google ScholarDigital Library
- Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 375--384. Google ScholarDigital Library
- Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA'09). 239--249.Google Scholar
- Sun, Z., Bi, X., Li, H. (Helen), Wong, W.-F., Ong, Z.-L., Zhu, X., and Wu, W. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). 329--338. Google ScholarDigital Library
- Sweazey, P. and Smith, A. J. 1986. A class of compatible cache consistency protocols and their support by the IEEE futurebus. SIGARCH Comput. Archit. News 14, 2, 414--423. Google ScholarDigital Library
- Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., and Agarwal, A. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2, 25--35. Google ScholarDigital Library
- Tehrani, S., Slaughter, J. M., Deherrera, M., Engel, B. N., Rizzo, N. D., Salter, J., Durlam, M., Dave, R. W., Janesky, J., Butcher, B., Smith, K., and Grynkewich, G. 2003. Magnetoresistive random access memory using magnetic tunnel junctions. Proc. IEEE 91, 5, 703--714.Google ScholarCross Ref
- Valero, A., Sahuquillo, J., Petit, S., and Duato, J. 2013. Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. In Proceedings of the 27th International ACM Conference on Supercomputing (ICS'13). 491--492. Google ScholarDigital Library
- Valero, A., Sahuquillo, J., Petit, S., López, P., and Duato, J. 2012. Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans. Archit. Code Optim. 9, 3, 16:1--16:20. Google ScholarDigital Library
- Wu, X., Li, J., Zhang, L., Speight, E., Rajamony, R., and Xie, Y. 2010. Design exploration of hybrid caches with disparate memory technologies. ACM Trans. Archit. Code Optim. 7, 3, 15:1--15:34. Google ScholarDigital Library
- Xue, C. J., Zhang, Y., Chen, Y., Sun, G., Yang, J. J., and Li, H. 2011. Emerging non-volatile memories: opportunities and challenges. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'11). 325--334. Google ScholarDigital Library
- Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers (ICCAD'09). 264--268. Google ScholarDigital Library
Index Terms
- Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
Recommendations
Multi retention level STT-RAM cache designs with a dynamic refresh scheme
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitectureSpin-transfer torque random access memory (STT-RAM) has received increasing attention because of its attractive features: good scalability, zero standby power, non-volatility and radiation hardness. The use of STT-RAM technology in the last level on-...
Cache coherence enabled adaptive refresh for volatile STT-RAM
DATE '13: Proceedings of the Conference on Design, Automation and Test in EuropeSpin-Transfer Torque RAM (STT-RAM) is extensively studied in recent years. Recent work proposed to improve the write performance of STT-RAM through relaxing the retention time of STT-RAM cell, magnetic tunnel junction (MTJ). Unfortunately, frequent ...
Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache
Spin-transfer torque RAM (STT-RAM) has been proposed to build on-chip caches because of its attractive features such as high storage density and ultra low leakage power. However, long write latency and high write energy are the two challenges for STT-RAM. ...
Comments