skip to main content
research-article

Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures

Published:02 September 2016Publication History
Skip Abstract Section

Abstract

Scratchpad memory (SPM) is considered a useful component in the memory hierarchy, solely or along with caches, for meeting the power and energy constraints as performance ceases to be the sole criteria for processor design. Although the efficiency of SPM is well known, its use has been restricted owing to difficulties in programmability. Real applications usually have regions that are amenable to exploitation by either SPM or cache and hence can benefit if the two are used in conjunction. Dynamically adjusting the local memory resources to suit application demand can significantly improve the efficiency of the overall system. In this article, we propose a compiler technique to map application data objects to the SPM-cache and also partition the local memory between the SPM and cache depending on the dynamic requirement of the application. First, we introduce a novel graph-based structure to tackle data allocation in an application. Second, we use this to present a data allocation heuristic to map program objects for a fixed-size SPM-cache hybrid system that targets whole program optimization. We finally extend this formulation to adapt the SPM and cache sizes, as well as the data allocation as per the requirement of different application regions. We study the applicability of the technique on various workloads targeted at both SPM-only and hardware reconfigurable memory systems, observing an average of 18% energy-delay improvement over state-of-the-art techniques.

References

  1. M. J. Absar and F. Catthoor. 2005. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’05). 1162--1167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David H. Albonesi. 1999. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd ACM/IEEE International Symposium on Microarchitecture (MICRO-32). 248--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Alvarez, L. Vilanova, M. Moreto, M. Casas, M. Gonzalez, X. Martorell, N. Navarro, E. Ayguade, and M. Valero. 2015. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 720--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lluc Alvarez, Lluís Vilanova, Marc Gonzalez, Xavier Martorell, Nacho Navarro, and Eduard Ayguade. 2012. Hardware-software coherence protocol for the coexistence of caches and local memories. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). 89:1--89:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ke Bai, Di Lu, and Aviral Shrivastava. 2011. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 14th International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’11). 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-33). 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2001. Comparison of Cache- and Scratch-Pad Based Memory Systems with Respect to Performance, Area and Energy Consumption. Technical Report 762, University of Dortmund.Google ScholarGoogle Scholar
  8. R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign (CODES’02). ACM, New York, NY, 73--78. DOI:http://dx.doi.org/10.1145/774789.774805 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Bertran, M. Gonzalez, X. Martorell, and E. Ayguade. 2010. Local memory design space exploration for high-performance computing. Computer Journal 54, 5, 786--799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of the 2011 International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’11). 52:1--52:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. B. Carvalho, L. F. M. Goes, and C. A. P. S. Martin. 2006. Dynamically reconfigurable cache architecture using adaptive block allocation policy. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Prasenjit Chakraborty and Preeti Ranjan Panda. 2012. Integrating software caches with scratch pad memory. In Proceedings of the 2012 International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES’12). 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Chen, O. Ozturk, M. Kandemir, and M. Karakoy. 2006. Dynamic scratch-pad memory management for irregular array access patterns. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). 931--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tong Chen, Haibo Lin, and Tao Zhang. 2008a. Orchestrating data transfer for the cell/B.E. Processor. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS’08). 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tong Chen, Zehra Sura, Kathryn O’Brien, and John K. O’Brien. 2007. Optimizing the use of static buffers for DMA on a cell chip. In Proceedings of the 19th International Conference on Languages and Compilers for Parallel Computing (LCPC’06). 314--329. http://dl.acm.org/citation.cfm?id=1757112.1757144 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tong Chen, Tao Zhang, Zehra Sura, and Mar Gonzales Tallada. 2008b. Prefetching irregular references for software cache on cell. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’08). 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jason Cong, Hui Huang, Chunyue Liu, and Yi Zou. 2011. A reuse-aware prefetching scheme for scratchpad memory. In Proceedings of the 48th Design Automation Conference (DAC’11). 960--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ashutosh S. Dhodapkar and James E. Smith. 2002. Managing multi-configuration hardware via dynamic working set analysis. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). 233--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alexandre E. Eichenberger, Kathryn O’Brien, Kevin O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, et al. 2005. Optimizing compiler for the cell processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05). 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. FERMI. 2010. NVIDIA’s Next Generation CUDA Compute Architecture. Available at http://www.nvidia.com.Google ScholarGoogle Scholar
  21. M. Gonzalez, N. Vujic, X. Martorell, E. Ayguade, A. E. Eichenberger, T. Chen, Z. Sura, T. Zang, K. O’Brien, and K. M. O’Brien. 2008. Hybrid access-specific software cache techniques for the cell BE architecture. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 292--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jayanth Gummaraju, Joel Coburn, Yoshio Turner, and Mendel Rosenblum. 2008. Streamware: Programming general-purpose multicore processors using streams. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). 297--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. S. Hu, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, H. Saputra, and W. Zhang. 2002. Compiler-directed cache polymorphism. In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’02). 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L Jaejin, S Sangmin, C. Kim, J. Kim, P. Chun, Z. Sura, J. Kim, and S.-Y. Han. 2008. COMIC: A coherent shared memory interface for cell BE. In Proceedings of the Conference on Parallel Architecture and Compilation Techniques (PACT’08). ACM, New York, NY, 303--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. 2005. Introduction to the cell multiprocessor. IBM Journal of Research and Development 49, 4--5, 589--604. http://dl.acm.org/citation.cfm?id=1148882.1148891 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference (DAC’01). ACM, New York, NY, 690--695. DOI:http://dx.doi.org/10.1145/378239.379049 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-30). IEEE, Los Alamitos, CA, 330--335. http://dl.acm.org/citation.cfm?id=266800.266832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, and Christos Kozyrakis. 2007. Comparing memory systems for chip multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 358--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, and Huiyang Zhou. 2014. Understanding the tradeoffs between software managed vs. hardware-managed caches in GPUs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’14). 231--242. Google ScholarGoogle ScholarCross RefCross Ref
  30. Lian Li, Hui Feng, and Jingling Xue. 2009. Compiler-directed scratchpad memory management via graph coloring. ACM Transactions on Architecture and Code Optimization 6, 3, 9:1--9:17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. PIN: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ken Mai, Tim Paaske, Nuwan Jayasena, Ron Ho, William J. Dally, and Mark Horowitz. 2000. Smart memories: A modular reconfigurable architecture. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 161--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Csaba Andras Moritz, Matthew Frank, and Saman P. Amarasinghe. 2001. FlexCache: A framework for flexible compiler generated data caching. In Proceedings of the 2nd International Workshop on Intelligent Memory Systems (IMS’01). 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. NERSC-8. 2013. Trinity Benchmarks. Available at http://www.nersc.gov.Google ScholarGoogle Scholar
  35. Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the 1997 European Conference on Design and Test (EDTC’97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Parthasarathy Ranganathan, Sarita Adve, and Norman P. Jouppi. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). 214--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Seo, J. Lee, and Z. Sura. 2009. Design and implementation of software-managed caches for multicores with local memory. In Proceedings of the 15th Conference on High Performance Computer Architecture (HPCA’09). 55--66. Google ScholarGoogle ScholarCross RefCross Ref
  38. Timothy Sherwood, Suleyman Sair, and Brad Calder. 2003. Phase tracking and prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03). 336--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. SPEC. 2000. CPU2000 V1.3 Retrieved July 5, 2016, from http://www.spec.org/cpu2000/.Google ScholarGoogle Scholar
  40. Karthik T. Sundararajan, Timothy M. Jones, and Nigel Topham. 2011. A reconfigurable cache architecture for energy efficiency. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF’11). 9:1--9:2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sumesh Udayakumaran and Rajeev Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’03). 276--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sumesh Udayakumaran and Rajeev Barua. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). 925--930. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 1
        January 2017
        463 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/2948199
        • Editor:
        • Naehyuck Chang
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 September 2016
        • Accepted: 1 May 2016
        • Revised: 1 March 2016
        • Received: 1 December 2015
        Published in todaes Volume 22, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader