Abstract
Scratchpad memory (SPM) is considered a useful component in the memory hierarchy, solely or along with caches, for meeting the power and energy constraints as performance ceases to be the sole criteria for processor design. Although the efficiency of SPM is well known, its use has been restricted owing to difficulties in programmability. Real applications usually have regions that are amenable to exploitation by either SPM or cache and hence can benefit if the two are used in conjunction. Dynamically adjusting the local memory resources to suit application demand can significantly improve the efficiency of the overall system. In this article, we propose a compiler technique to map application data objects to the SPM-cache and also partition the local memory between the SPM and cache depending on the dynamic requirement of the application. First, we introduce a novel graph-based structure to tackle data allocation in an application. Second, we use this to present a data allocation heuristic to map program objects for a fixed-size SPM-cache hybrid system that targets whole program optimization. We finally extend this formulation to adapt the SPM and cache sizes, as well as the data allocation as per the requirement of different application regions. We study the applicability of the technique on various workloads targeted at both SPM-only and hardware reconfigurable memory systems, observing an average of 18% energy-delay improvement over state-of-the-art techniques.
- M. J. Absar and F. Catthoor. 2005. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’05). 1162--1167. Google ScholarDigital Library
- David H. Albonesi. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd ACM/IEEE International Symposium on Microarchitecture (MICRO-32). 248--259. Google ScholarDigital Library
- L. Alvarez, L. Vilanova, M. Moreto, M. Casas, M. Gonzalez, X. Martorell, N. Navarro, E. Ayguade, and M. Valero. 2015. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 720--732. Google ScholarDigital Library
- Lluc Alvarez, Lluís Vilanova, Marc Gonzalez, Xavier Martorell, Nacho Navarro, and Eduard Ayguade. 2012. Hardware-software coherence protocol for the coexistence of caches and local memories. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 89:1--89:11. Google ScholarDigital Library
- Ke Bai, Di Lu, and Aviral Shrivastava. 2011. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 14th International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’11). 215--224. Google ScholarDigital Library
- Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-33). 245--257. Google ScholarDigital Library
- R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2001. Comparison of Cache- and Scratch-Pad Based Memory Systems with Respect to Performance, Area and Energy Consumption. Technical Report 762, University of Dortmund.Google Scholar
- R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). ACM, New York, NY, 73--78. DOI:http://dx.doi.org/10.1145/774789.774805 Google ScholarDigital Library
- R. Bertran, M. Gonzalez, X. Martorell, and E. Ayguade. 2010. Local memory design space exploration for high-performance computing. Computer Journal 54, 5, 786--799. Google ScholarDigital Library
- Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of the 2011 International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’11). 52:1--52:12. Google ScholarDigital Library
- M. B. Carvalho, L. F. M. Goes, and C. A. P. S. Martin. 2006. Dynamically reconfigurable cache architecture using adaptive block allocation policy. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS’06). Google ScholarDigital Library
- Prasenjit Chakraborty and Preeti Ranjan Panda. 2012. Integrating software caches with scratch pad memory. In Proceedings of the 2012 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’12). 201--210. Google ScholarDigital Library
- G. Chen, O. Ozturk, M. Kandemir, and M. Karakoy. 2006. Dynamic scratch-pad memory management for irregular array access patterns. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). 931--936. Google ScholarDigital Library
- Tong Chen, Haibo Lin, and Tao Zhang. 2008a. Orchestrating data transfer for the cell/B.E. Processor. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS’08). 289--298. Google ScholarDigital Library
- Tong Chen, Zehra Sura, Kathryn O’Brien, and John K. O’Brien. 2007. Optimizing the use of static buffers for DMA on a cell chip. In Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (LCPC’06). 314--329. http://dl.acm.org/citation.cfm?id=1757112.1757144 Google ScholarDigital Library
- Tong Chen, Tao Zhang, Zehra Sura, and Mar Gonzales Tallada. 2008b. Prefetching irregular references for software cache on cell. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). 155--164. Google ScholarDigital Library
- Jason Cong, Hui Huang, Chunyue Liu, and Yi Zou. 2011. A reuse-aware prefetching scheme for scratchpad memory. In Proceedings of the 48th Design Automation Conference (DAC’11). 960--965. Google ScholarDigital Library
- Ashutosh S. Dhodapkar and James E. Smith. 2002. Managing multi-configuration hardware via dynamic working set analysis. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). 233--244. Google ScholarDigital Library
- Alexandre E. Eichenberger, Kathryn O’Brien, Kevin O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, et al. 2005. Optimizing compiler for the cell processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05). 161--172. Google ScholarDigital Library
- FERMI. 2010. NVIDIA’s Next Generation CUDA Compute Architecture. Available at http://www.nvidia.com.Google Scholar
- M. Gonzalez, N. Vujic, X. Martorell, E. Ayguade, A. E. Eichenberger, T. Chen, Z. Sura, T. Zang, K. O’Brien, and K. M. O’Brien. 2008. Hybrid access-specific software cache techniques for the cell BE architecture. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 292--302. Google ScholarDigital Library
- Jayanth Gummaraju, Joel Coburn, Yoshio Turner, and Mendel Rosenblum. 2008. Streamware: Programming general-purpose multicore processors using streams. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). 297--307. Google ScholarDigital Library
- J. S. Hu, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, H. Saputra, and W. Zhang. 2002. Compiler-directed cache polymorphism. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’02). 165--174. Google ScholarDigital Library
- L Jaejin, S Sangmin, C. Kim, J. Kim, P. Chun, Z. Sura, J. Kim, and S.-Y. Han. 2008. COMIC: A coherent shared memory interface for cell BE. In Proceedings of the Conference on Parallel Architecture and Compilation Techniques (PACT’08). ACM, New York, NY, 303--314. Google ScholarDigital Library
- J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. 2005. Introduction to the cell multiprocessor. IBM Journal of Research and Development 49, 4--5, 589--604. http://dl.acm.org/citation.cfm?id=1148882.1148891 Google ScholarDigital Library
- M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference (DAC’01). ACM, New York, NY, 690--695. DOI:http://dx.doi.org/10.1145/378239.379049 Google ScholarDigital Library
- Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-30). IEEE, Los Alamitos, CA, 330--335. http://dl.acm.org/citation.cfm?id=266800.266832. Google ScholarDigital Library
- Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, and Christos Kozyrakis. 2007. Comparing memory systems for chip multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 358--368. Google ScholarDigital Library
- Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, and Huiyang Zhou. 2014. Understanding the tradeoffs between software managed vs. hardware-managed caches in GPUs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’14). 231--242. Google ScholarCross Ref
- Lian Li, Hui Feng, and Jingling Xue. 2009. Compiler-directed scratchpad memory management via graph coloring. ACM Transactions on Architecture and Code Optimization 6, 3, 9:1--9:17. Google ScholarDigital Library
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. PIN: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 190--200. Google ScholarDigital Library
- Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 161--171. Google ScholarDigital Library
- Csaba Andras Moritz, Matthew Frank, and Saman P. Amarasinghe. 2001. FlexCache: A framework for flexible compiler generated data caching. In Proceedings of the 2nd International Workshop on Intelligent Memory Systems (IMS’01). 135--146. Google ScholarDigital Library
- NERSC-8. 2013. Trinity Benchmarks. Available at http://www.nersc.gov.Google Scholar
- Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the 1997 European Conference on Design and Test (EDTC’97). Google ScholarDigital Library
- Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 214--224. Google ScholarDigital Library
- S. Seo, J. Lee, and Z. Sura. 2009. Design and implementation of software-managed caches for multicores with local memory. In Proceedings of the 15th Conference on High Performance Computer Architecture (HPCA’09). 55--66. Google ScholarCross Ref
- Timothy Sherwood, Suleyman Sair, and Brad Calder. 2003. Phase tracking and prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 336--349. Google ScholarDigital Library
- SPEC. 2000. CPU2000 V1.3 Retrieved July 5, 2016, from http://www.spec.org/cpu2000/.Google Scholar
- Karthik T. Sundararajan, Timothy M. Jones, and Nigel Topham. 2011. A reconfigurable cache architecture for energy efficiency. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF’11). 9:1--9:2. Google ScholarDigital Library
- Sumesh Udayakumaran and Rajeev Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’03). 276--286. Google ScholarDigital Library
- Sumesh Udayakumaran and Rajeev Barua. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). 925--930. Google ScholarDigital Library
Index Terms
- Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures
Recommendations
A Phase-Based Self-Tuning Algorithm for Reconfigurable Cache
ICDS '07: Proceedings of the First International Conference on the Digital SocietyThe performance of a given cache architecture is largely determined by the behavior of the application using the cache. Reconfigurable cache is an effective low-power technique. Using the technique, microprocessor's cache can be configured dynamically ...
Dynamic data scratchpad memory management for a memory subsystem with an MMU
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIn this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads ...
Endurance-Aware Allocation of Data Variables on NVM-Based Scratchpad Memory in Real-Time Embedded Systems
Nonvolatile memory (NVM) has many benefits compared to the traditional static RAM, such as improved reliability and reduced power consumption, but it has long write latency and limited write endurance. Scratchpad memory (SPM) is software-managed small on-...
Comments