skip to main content
research-article

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

Published:20 December 2013Publication History
Skip Abstract Section

Abstract

Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache memories is impeded by its long write latency and high write power. Recent work proposed improving the write performance through relaxing the retention time of STT-RAM cells. The resultant volatile STT-RAM needs to be periodically refreshed to prevent data loss. When volatile STT-RAM is applied as the last-level cache (LLC) in chip multiprocessor (CMP) systems, frequent refresh operations could dissipate significant extra energy. In addition, refresh operations could severely conflict with normal read/write operations to degrade overall system performance. Therefore, minimizing the performance impact caused by refresh operations is crucial for the adoption of volatile STT-RAM.

In this article, we propose Cache-Coherence-Enabled Adaptive Refresh (CCear) to minimize the number of refresh operations for volatile STT-RAM, adopted as the LLC for CMP systems. Specifically, CCear interacts with cache coherence protocol and cache management policy to minimize the number of refresh operations on volatile STT-RAM caches. Full-system simulation results show that CCear performs close to an ideal refresh policy with low overhead. Compared with state-of-the-art refresh policies, CCear simultaneously improves the system performance and reduces the energy consumption. Moreover, the performance of CCear could be further enhanced using small filter caches to accommodate the not-refreshed private STT-RAM blocks.

References

  1. Barth, J., Reohr, W. R., Parries, P., Fredeman, G., Golz, J., Schuster, S. E., Matick, R. E., Hunter, H., Tanner, C. C., Harig, J., Kim, H., Khan, B. A., Griesemer, J., Havreluk, R. P., Yanagisawa, K., Kirihata, T., and Iyer, S. S. 2008. A 500 MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE J. Solid-State Circ. 43, 1, 86--95.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bienia, C. 2011. Benchmarking modern multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chen, E., Apalkov, D., Diao, Z., Driskill-Smith, A., Druist, D., Lottis, D., Nikitin, V., Tang, X., Watts, S., Wang, S., Wolf, S. A., Ghosh, A. W., Lu, J. W., Poon, S. J., Stan, M., Butler, W. H., Gupta, S., Mewes, C., Mewes, T., and Visscher, P. B. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Tran. Magnet. 46, 6, 1873--1878.Google ScholarGoogle ScholarCross RefCross Ref
  4. Chen, Y.-T., Cong, J., Huang, H., Liu, B., Liu, C., Potkonjak, M., and Reinman, G. 2012. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 45--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dally, W. J. and Towles, B. 2001. Route packets, not wires: On-chip inteconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 684--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dong, X., Wu, X., Sun, G., Xie, Y., Li, H., and Chen, Y. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 554--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of 29th Annual International Symposium on Computer Architecture (ISCA'02). 148--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ghosh, M. and Lee, H.-H. S. 2007. Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'40). 134--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hosomi, M., Yamagishi, H., Yamamoto, T., Bessho, K., Higo, Y., Yamane, K., Yamada, H., Shoji, M., Hachino, H., Fukumoto, C., Nagao, H., and Kano, H. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the IEEE International Electron Devices Meeting (IEDM'05). 459--462.Google ScholarGoogle Scholar
  10. Hu, Z., Kaxiras, S., and Martonosi, M. 2002. Let caches decay: Reducing leakage energy via exploitation of cache generational behavior. ACM Trans. Comput. Syst. 20, 2, 161--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jadidi, A., Arjomand, M., and Sarbazi-Azad, H. 2011. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design (ISLPED'11). 79--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jaleel, A., Theobald, K. B., Steely, S. C. Jr., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jog, A., Mishra, A. K., Xu, C., Xie, Y., Narayanan, V., Iyer, R., and Das, C. R. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kahng, A. B., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 423--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kalla, R., Sinharoy, B., Starke, W. J., and Floyd, M. 2010. Power7: IBM's next-generation server processor. IEEE Micro 30, 2, 7--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Khan, S. M., Jiménez, D. A., Burger, D., and Falsafi, B. 2010a. Using dead blocks as a virtual victim cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 489--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Khan, S. M., Tian, Y., and Jimenez, D. A. 2010b. Sampling dead block prediction for last-level caches. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kim, N. S., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. Computer 36, 12, 68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kin, J., Gupta, M., and Mangione-Smith, W. H. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (MICRO'97). 184--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, J., Shi, L., Xue, C. J., Yang, C., and Xu, Y. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia). 19--28.Google ScholarGoogle Scholar
  21. Li, Q., Li, J., Shi, L., Xue, C. J., and He, Y. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). 351--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'41). 222--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liu, J., Jaiyen, B., Veras, R., and Mutlu, O. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA'12). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Architect. News 33, 4, 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Meng, Y., Sherwood, T., and Kastner, R. 2005. On the limits of leakage power reduction in caches. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). 154--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Micron Technology. 2007. Calculating Memory System Power for DDR3. 2007. http://download.micron.com/pdf/technotes/ddr3/TN41_01DDR3Power.pdf.Google ScholarGoogle Scholar
  28. Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 3--14. http://www.hpl.hp.com/research/cacti/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely S. C. Jr., and Emer, J. 2008. Set-dueling-controlled adaptive insertion for high-performance caching. IEEE Micro 28, 1, 91--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rasquinha, M., Choudhary, D., Chatterjee, S., Mukhopadhyay, S., and Yalamanchili, S. 2010. An energy efficient cache design using spin torque transfer (STT) RAM. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'10). 389--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Smullen, C. W., Mohan, V., Nigam, A., Gurumurthi, S., and Stan, M. R. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA'11). 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sorin, D. J., Hill, M. D., and Wood, D. A. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan and Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 375--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA'09). 239--249.Google ScholarGoogle Scholar
  35. Sun, Z., Bi, X., Li, H. (Helen), Wong, W.-F., Ong, Z.-L., Zhu, X., and Wu, W. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sweazey, P. and Smith, A. J. 1986. A class of compatible cache consistency protocols and their support by the IEEE futurebus. SIGARCH Comput. Archit. News 14, 2, 414--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., and Agarwal, A. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2, 25--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tehrani, S., Slaughter, J. M., Deherrera, M., Engel, B. N., Rizzo, N. D., Salter, J., Durlam, M., Dave, R. W., Janesky, J., Butcher, B., Smith, K., and Grynkewich, G. 2003. Magnetoresistive random access memory using magnetic tunnel junctions. Proc. IEEE 91, 5, 703--714.Google ScholarGoogle ScholarCross RefCross Ref
  39. Valero, A., Sahuquillo, J., Petit, S., and Duato, J. 2013. Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. In Proceedings of the 27th International ACM Conference on Supercomputing (ICS'13). 491--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Valero, A., Sahuquillo, J., Petit, S., López, P., and Duato, J. 2012. Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans. Archit. Code Optim. 9, 3, 16:1--16:20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wu, X., Li, J., Zhang, L., Speight, E., Rajamony, R., and Xie, Y. 2010. Design exploration of hybrid caches with disparate memory technologies. ACM Trans. Archit. Code Optim. 7, 3, 15:1--15:34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xue, C. J., Zhang, Y., Chen, Y., Sun, G., Yang, J. J., and Li, H. 2011. Emerging non-volatile memories: opportunities and challenges. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'11). 325--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers (ICCAD'09). 264--268. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 19, Issue 1
        December 2013
        210 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/2558148
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 December 2013
        • Accepted: 1 September 2013
        • Revised: 1 June 2013
        • Received: 1 December 2012
        Published in todaes Volume 19, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader