research-article

Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures

Authors:
Prasenjit Chakraborty

Indian Institute of Technology Delhi

Indian Institute of Technology Delhi
View Profile

,
Preeti Ranjan Panda

Indian Institute of Technology Delhi, India

Indian Institute of Technology Delhi, India
View Profile

,
Sandeep Sen

Indian Institute of Technology Delhi, India

Indian Institute of Technology Delhi, India
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 22 Issue 1Article No.: 12pp 1–25https://doi.org/10.1145/2934680

Published:02 September 2016Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Scratchpad memory (SPM) is considered a useful component in the memory hierarchy, solely or along with caches, for meeting the power and energy constraints as performance ceases to be the sole criteria for processor design. Although the efficiency of SPM is well known, its use has been restricted owing to difficulties in programmability. Real applications usually have regions that are amenable to exploitation by either SPM or cache and hence can benefit if the two are used in conjunction. Dynamically adjusting the local memory resources to suit application demand can significantly improve the efficiency of the overall system. In this article, we propose a compiler technique to map application data objects to the SPM-cache and also partition the local memory between the SPM and cache depending on the dynamic requirement of the application. First, we introduce a novel graph-based structure to tackle data allocation in an application. Second, we use this to present a data allocation heuristic to map program objects for a fixed-size SPM-cache hybrid system that targets whole program optimization. We finally extend this formulation to adapt the SPM and cache sizes, as well as the data allocation as per the requirement of different application regions. We study the applicability of the technique on various workloads targeted at both SPM-only and hardware reconfigurable memory systems, observing an average of 18% energy-delay improvement over state-of-the-art techniques.

References

M. J. Absar and F. Catthoor. 2005. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’05). 1162--1167. Google ScholarDigital Library
David H. Albonesi. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd ACM/IEEE International Symposium on Microarchitecture (MICRO-32). 248--259. Google ScholarDigital Library
L. Alvarez, L. Vilanova, M. Moreto, M. Casas, M. Gonzalez, X. Martorell, N. Navarro, E. Ayguade, and M. Valero. 2015. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 720--732. Google ScholarDigital Library
Lluc Alvarez, Lluís Vilanova, Marc Gonzalez, Xavier Martorell, Nacho Navarro, and Eduard Ayguade. 2012. Hardware-software coherence protocol for the coexistence of caches and local memories. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 89:1--89:11. Google ScholarDigital Library
Ke Bai, Di Lu, and Aviral Shrivastava. 2011. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 14th International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’11). 215--224. Google ScholarDigital Library
Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-33). 245--257. Google ScholarDigital Library
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2001. Comparison of Cache- and Scratch-Pad Based Memory Systems with Respect to Performance, Area and Energy Consumption. Technical Report 762, University of Dortmund.Google Scholar
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). ACM, New York, NY, 73--78. DOI:http://dx.doi.org/10.1145/774789.774805 Google ScholarDigital Library
R. Bertran, M. Gonzalez, X. Martorell, and E. Ayguade. 2010. Local memory design space exploration for high-performance computing. Computer Journal 54, 5, 786--799. Google ScholarDigital Library
Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of the 2011 International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’11). 52:1--52:12. Google ScholarDigital Library
M. B. Carvalho, L. F. M. Goes, and C. A. P. S. Martin. 2006. Dynamically reconfigurable cache architecture using adaptive block allocation policy. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS’06). Google ScholarDigital Library
Prasenjit Chakraborty and Preeti Ranjan Panda. 2012. Integrating software caches with scratch pad memory. In Proceedings of the 2012 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’12). 201--210. Google ScholarDigital Library
G. Chen, O. Ozturk, M. Kandemir, and M. Karakoy. 2006. Dynamic scratch-pad memory management for irregular array access patterns. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). 931--936. Google ScholarDigital Library
Tong Chen, Haibo Lin, and Tao Zhang. 2008a. Orchestrating data transfer for the cell/B.E. Processor. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS’08). 289--298. Google ScholarDigital Library
Tong Chen, Zehra Sura, Kathryn O’Brien, and John K. O’Brien. 2007. Optimizing the use of static buffers for DMA on a cell chip. In Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (LCPC’06). 314--329. http://dl.acm.org/citation.cfm?id=1757112.1757144 Google ScholarDigital Library
Tong Chen, Tao Zhang, Zehra Sura, and Mar Gonzales Tallada. 2008b. Prefetching irregular references for software cache on cell. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). 155--164. Google ScholarDigital Library
Jason Cong, Hui Huang, Chunyue Liu, and Yi Zou. 2011. A reuse-aware prefetching scheme for scratchpad memory. In Proceedings of the 48th Design Automation Conference (DAC’11). 960--965. Google ScholarDigital Library
Ashutosh S. Dhodapkar and James E. Smith. 2002. Managing multi-configuration hardware via dynamic working set analysis. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). 233--244. Google ScholarDigital Library
Alexandre E. Eichenberger, Kathryn O’Brien, Kevin O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, et al. 2005. Optimizing compiler for the cell processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05). 161--172. Google ScholarDigital Library
FERMI. 2010. NVIDIA’s Next Generation CUDA Compute Architecture. Available at http://www.nvidia.com.Google Scholar
M. Gonzalez, N. Vujic, X. Martorell, E. Ayguade, A. E. Eichenberger, T. Chen, Z. Sura, T. Zang, K. O’Brien, and K. M. O’Brien. 2008. Hybrid access-specific software cache techniques for the cell BE architecture. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 292--302. Google ScholarDigital Library
Jayanth Gummaraju, Joel Coburn, Yoshio Turner, and Mendel Rosenblum. 2008. Streamware: Programming general-purpose multicore processors using streams. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). 297--307. Google ScholarDigital Library
J. S. Hu, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, H. Saputra, and W. Zhang. 2002. Compiler-directed cache polymorphism. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’02). 165--174. Google ScholarDigital Library
L Jaejin, S Sangmin, C. Kim, J. Kim, P. Chun, Z. Sura, J. Kim, and S.-Y. Han. 2008. COMIC: A coherent shared memory interface for cell BE. In Proceedings of the Conference on Parallel Architecture and Compilation Techniques (PACT’08). ACM, New York, NY, 303--314. Google ScholarDigital Library
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. 2005. Introduction to the cell multiprocessor. IBM Journal of Research and Development 49, 4--5, 589--604. http://dl.acm.org/citation.cfm?id=1148882.1148891 Google ScholarDigital Library
M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference (DAC’01). ACM, New York, NY, 690--695. DOI:http://dx.doi.org/10.1145/378239.379049 Google ScholarDigital Library
Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-30). IEEE, Los Alamitos, CA, 330--335. http://dl.acm.org/citation.cfm?id=266800.266832. Google ScholarDigital Library
Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, and Christos Kozyrakis. 2007. Comparing memory systems for chip multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 358--368. Google ScholarDigital Library
Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, and Huiyang Zhou. 2014. Understanding the tradeoffs between software managed vs. hardware-managed caches in GPUs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’14). 231--242. Google ScholarCross Ref
Lian Li, Hui Feng, and Jingling Xue. 2009. Compiler-directed scratchpad memory management via graph coloring. ACM Transactions on Architecture and Code Optimization 6, 3, 9:1--9:17. Google ScholarDigital Library
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. PIN: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 190--200. Google ScholarDigital Library
Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 161--171. Google ScholarDigital Library
Csaba Andras Moritz, Matthew Frank, and Saman P. Amarasinghe. 2001. FlexCache: A framework for flexible compiler generated data caching. In Proceedings of the 2nd International Workshop on Intelligent Memory Systems (IMS’01). 135--146. Google ScholarDigital Library
NERSC-8. 2013. Trinity Benchmarks. Available at http://www.nersc.gov.Google Scholar
Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the 1997 European Conference on Design and Test (EDTC’97). Google ScholarDigital Library
Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 214--224. Google ScholarDigital Library
S. Seo, J. Lee, and Z. Sura. 2009. Design and implementation of software-managed caches for multicores with local memory. In Proceedings of the 15th Conference on High Performance Computer Architecture (HPCA’09). 55--66. Google ScholarCross Ref
Timothy Sherwood, Suleyman Sair, and Brad Calder. 2003. Phase tracking and prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 336--349. Google ScholarDigital Library
SPEC. 2000. CPU2000 V1.3 Retrieved July 5, 2016, from http://www.spec.org/cpu2000/.Google Scholar
Karthik T. Sundararajan, Timothy M. Jones, and Nigel Topham. 2011. A reconfigurable cache architecture for energy efficiency. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF’11). 9:1--9:2. Google ScholarDigital Library
Sumesh Udayakumaran and Rajeev Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’03). 276--286. Google ScholarDigital Library
Sumesh Udayakumaran and Rajeev Barua. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). 925--930. Google ScholarDigital Library

Index Terms

Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing
  2. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software

Recommendations

A Phase-Based Self-Tuning Algorithm for Reconfigurable Cache
ICDS '07: Proceedings of the First International Conference on the Digital Society

The performance of a given cache architecture is largely determined by the behavior of the application using the cache. Reconfigurable cache is an effective low-power technique. Using the technique, microprocessor's cache can be configured dynamically ...
Read More
Dynamic data scratchpad memory management for a memory subsystem with an MMU
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

In this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads ...
Read More
Endurance-Aware Allocation of Data Variables on NVM-Based Scratchpad Memory in Real-Time Embedded Systems
Nonvolatile memory (NVM) has many benefits compared to the traditional static RAM, such as improved reliability and reduced power consumption, but it has long write latency and limited write endurance. Scratchpad memory (SPM) is software-managed small on-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 22, Issue 1
January 2017
463 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2948199
Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 2 September 2016
- Accepted: 1 May 2016
- Revised: 1 March 2016
- Received: 1 December 2015
Published in todaes Volume 22, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Reconfigurable cache
memory management strategies
scratchpad memory optimization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 301
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

A Phase-Based Self-Tuning Algorithm for Reconfigurable Cache

Dynamic data scratchpad memory management for a memory subsystem with an MMU

Endurance-Aware Allocation of Data Variables on NVM-Based Scratchpad Memory in Real-Time Embedded Systems