research-article

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

Authors:
Jianhua Li

Hefei University of Technology, P.R. China

Hefei University of Technology, P.R. China
View Profile

,
Liang Shi

Chongqing University, P.R. China

Chongqing University, P.R. China
View Profile

,
Qingan Li

Wuhan University, P.R. China

Wuhan University, P.R. China
View Profile

,
Chun Jason Xue

City University of Hong Kong, Kowloon, Hong Kong

City University of Hong Kong, Kowloon, Hong Kong
View Profile

,
Yiran Chen

University of Pittsburgh, Pittsburgh, PA

University of Pittsburgh, Pittsburgh, PA
View Profile

,
Yinlong Xu

University of Science and Technology of China, P.R. China

University of Science and Technology of China, P.R. China
View Profile

,
Wei Wang

Hefei University of Technology, P.R. China

Hefei University of Technology, P.R. China
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 19 Issue 1Article No.: 5pp 1–23https://doi.org/10.1145/2534393

Published:20 December 2013Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache memories is impeded by its long write latency and high write power. Recent work proposed improving the write performance through relaxing the retention time of STT-RAM cells. The resultant volatile STT-RAM needs to be periodically refreshed to prevent data loss. When volatile STT-RAM is applied as the last-level cache (LLC) in chip multiprocessor (CMP) systems, frequent refresh operations could dissipate significant extra energy. In addition, refresh operations could severely conflict with normal read/write operations to degrade overall system performance. Therefore, minimizing the performance impact caused by refresh operations is crucial for the adoption of volatile STT-RAM.

In this article, we propose Cache-Coherence-Enabled Adaptive Refresh (CCear) to minimize the number of refresh operations for volatile STT-RAM, adopted as the LLC for CMP systems. Specifically, CCear interacts with cache coherence protocol and cache management policy to minimize the number of refresh operations on volatile STT-RAM caches. Full-system simulation results show that CCear performs close to an ideal refresh policy with low overhead. Compared with state-of-the-art refresh policies, CCear simultaneously improves the system performance and reduces the energy consumption. Moreover, the performance of CCear could be further enhanced using small filter caches to accommodate the not-refreshed private STT-RAM blocks.

References

Barth, J., Reohr, W. R., Parries, P., Fredeman, G., Golz, J., Schuster, S. E., Matick, R. E., Hunter, H., Tanner, C. C., Harig, J., Kim, H., Khan, B. A., Griesemer, J., Havreluk, R. P., Yanagisawa, K., Kirihata, T., and Iyer, S. S. 2008. A 500 MHz random cycle, 1.5 ns latency, SOI embedded DRAM macro featuring a three-transistor micro sense amplifier. IEEE J. Solid-State Circ. 43, 1, 86--95.Google ScholarCross Ref
Bienia, C. 2011. Benchmarking modern multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ. Google ScholarDigital Library
Chen, E., Apalkov, D., Diao, Z., Driskill-Smith, A., Druist, D., Lottis, D., Nikitin, V., Tang, X., Watts, S., Wang, S., Wolf, S. A., Ghosh, A. W., Lu, J. W., Poon, S. J., Stan, M., Butler, W. H., Gupta, S., Mewes, C., Mewes, T., and Visscher, P. B. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Tran. Magnet. 46, 6, 1873--1878.Google ScholarCross Ref
Chen, Y.-T., Cong, J., Huang, H., Liu, B., Liu, C., Potkonjak, M., and Reinman, G. 2012. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE). 45--50. Google ScholarDigital Library
Dally, W. J. and Towles, B. 2001. Route packets, not wires: On-chip inteconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC'01). 684--689. Google ScholarDigital Library
Dong, X., Wu, X., Sun, G., Xie, Y., Li, H., and Chen, Y. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 554--559. Google ScholarDigital Library
Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of 29th Annual International Symposium on Computer Architecture (ISCA'02). 148--157. Google ScholarDigital Library
Ghosh, M. and Lee, H.-H. S. 2007. Smart Refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'40). 134--145. Google ScholarDigital Library
Hosomi, M., Yamagishi, H., Yamamoto, T., Bessho, K., Higo, Y., Yamane, K., Yamada, H., Shoji, M., Hachino, H., Fukumoto, C., Nagao, H., and Kano, H. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the IEEE International Electron Devices Meeting (IEDM'05). 459--462.Google Scholar
Hu, Z., Kaxiras, S., and Martonosi, M. 2002. Let caches decay: Reducing leakage energy via exploitation of cache generational behavior. ACM Trans. Comput. Syst. 20, 2, 161--190. Google ScholarDigital Library
Jadidi, A., Arjomand, M., and Sarbazi-Azad, H. 2011. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement. In Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design (ISLPED'11). 79--84. Google ScholarDigital Library
Jaleel, A., Theobald, K. B., Steely, S. C. Jr., and Emer, J. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). 60--71. Google ScholarDigital Library
Jog, A., Mishra, A. K., Xu, C., Xie, Y., Narayanan, V., Iyer, R., and Das, C. R. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). 243--252. Google ScholarDigital Library
Kahng, A. B., Li, B., Peh, L.-S., and Samadi, K. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 423--428. Google ScholarDigital Library
Kalla, R., Sinharoy, B., Starke, W. J., and Floyd, M. 2010. Power7: IBM's next-generation server processor. IEEE Micro 30, 2, 7--15. Google ScholarDigital Library
Khan, S. M., Jiménez, D. A., Burger, D., and Falsafi, B. 2010a. Using dead blocks as a virtual victim cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 489--500. Google ScholarDigital Library
Khan, S. M., Tian, Y., and Jimenez, D. A. 2010b. Sampling dead block prediction for last-level caches. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 175--186. Google ScholarDigital Library
Kim, N. S., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. Computer 36, 12, 68--75. Google ScholarDigital Library
Kin, J., Gupta, M., and Mangione-Smith, W. H. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture (MICRO'97). 184--193. Google ScholarDigital Library
Li, J., Shi, L., Xue, C. J., Yang, C., and Xu, Y. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia). 19--28.Google Scholar
Li, Q., Li, J., Shi, L., Xue, C. J., and He, Y. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). 351--356. Google ScholarDigital Library
Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'41). 222--233. Google ScholarDigital Library
Liu, J., Jaiyen, B., Veras, R., and Mutlu, O. 2012. RAIDR: Retention-aware intelligent DRAM refresh. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA'12). 1--12. Google ScholarDigital Library
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58. Google ScholarDigital Library
Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Architect. News 33, 4, 92--99. Google ScholarDigital Library
Meng, Y., Sherwood, T., and Kastner, R. 2005. On the limits of leakage power reduction in caches. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA'05). 154--165. Google ScholarDigital Library
Micron Technology. 2007. Calculating Memory System Power for DDR3. 2007. http://download.micron.com/pdf/technotes/ddr3/TN41_01DDR3Power.pdf.Google Scholar
Muralimanohar, N., Balasubramonian, R., and Jouppi, N. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 3--14. http://www.hpl.hp.com/research/cacti/. Google ScholarDigital Library
Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely S. C. Jr., and Emer, J. 2008. Set-dueling-controlled adaptive insertion for high-performance caching. IEEE Micro 28, 1, 91--98. Google ScholarDigital Library
Rasquinha, M., Choudhary, D., Chatterjee, S., Mukhopadhyay, S., and Yalamanchili, S. 2010. An energy efficient cache design using spin torque transfer (STT) RAM. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'10). 389--394. Google ScholarDigital Library
Smullen, C. W., Mohan, V., Nigam, A., Gurumurthi, S., and Stan, M. R. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA'11). 50--61. Google ScholarDigital Library
Sorin, D. J., Hill, M. D., and Wood, D. A. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan and Claypool. Google ScholarDigital Library
Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). 375--384. Google ScholarDigital Library
Sun, G., Dong, X., Xie, Y., Li, J., and Chen, Y. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA'09). 239--249.Google Scholar
Sun, Z., Bi, X., Li, H. (Helen), Wong, W.-F., Ong, Z.-L., Zhu, X., and Wu, W. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). 329--338. Google ScholarDigital Library
Sweazey, P. and Smith, A. J. 1986. A class of compatible cache consistency protocols and their support by the IEEE futurebus. SIGARCH Comput. Archit. News 14, 2, 414--423. Google ScholarDigital Library
Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., and Agarwal, A. 2002. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22, 2, 25--35. Google ScholarDigital Library
Tehrani, S., Slaughter, J. M., Deherrera, M., Engel, B. N., Rizzo, N. D., Salter, J., Durlam, M., Dave, R. W., Janesky, J., Butcher, B., Smith, K., and Grynkewich, G. 2003. Magnetoresistive random access memory using magnetic tunnel junctions. Proc. IEEE 91, 5, 703--714.Google ScholarCross Ref
Valero, A., Sahuquillo, J., Petit, S., and Duato, J. 2013. Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches. In Proceedings of the 27th International ACM Conference on Supercomputing (ICS'13). 491--492. Google ScholarDigital Library
Valero, A., Sahuquillo, J., Petit, S., López, P., and Duato, J. 2012. Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans. Archit. Code Optim. 9, 3, 16:1--16:20. Google ScholarDigital Library
Wu, X., Li, J., Zhang, L., Speight, E., Rajamony, R., and Xie, Y. 2010. Design exploration of hybrid caches with disparate memory technologies. ACM Trans. Archit. Code Optim. 7, 3, 15:1--15:34. Google ScholarDigital Library
Xue, C. J., Zhang, Y., Chen, Y., Sun, G., Yang, J. J., and Li, H. 2011. Emerging non-volatile memories: opportunities and challenges. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'11). 325--334. Google ScholarDigital Library
Zhou, P., Zhao, B., Yang, J., and Zhang, Y. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers (ICCAD'09). 264--268. Google ScholarDigital Library

Index Terms

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Multi retention level STT-RAM cache designs with a dynamic refresh scheme
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Spin-transfer torque random access memory (STT-RAM) has received increasing attention because of its attractive features: good scalability, zero standby power, non-volatility and radiation hardness. The use of STT-RAM technology in the last level on-...
Read More
Cache coherence enabled adaptive refresh for volatile STT-RAM
DATE '13: Proceedings of the Conference on Design, Automation and Test in Europe

Spin-Transfer Torque RAM (STT-RAM) is extensively studied in recent years. Recent work proposed to improve the write performance of STT-RAM through relaxing the retention time of STT-RAM cell, magnetic tunnel junction (MTJ). Unfortunately, frequent ...
Read More
Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache
Spin-transfer torque RAM (STT-RAM) has been proposed to build on-chip caches because of its attractive features such as high storage density and ultra low leakage power. However, long write latency and high write energy are the two challenges for STT-RAM. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Design Automation of Electronic Systems Volume 19, Issue 1
December 2013
210 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2558148
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 20 December 2013
- Accepted: 1 September 2013
- Revised: 1 June 2013
- Received: 1 December 2012
Published in todaes Volume 19, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Spin-torque transfer RAM
cache coherence
embedded DRAM
energy efficiency
nonvolatile memory
refresh
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 448
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Multi retention level STT-RAM cache designs with a dynamic refresh scheme

Cache coherence enabled adaptive refresh for volatile STT-RAM

Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Multi retention level STT-RAM cache designs with a dynamic refresh scheme

Cache coherence enabled adaptive refresh for volatile STT-RAM

Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media