skip to main content
COMA-F: a non-hierarchical cache only memory architecture
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Order Number:UMI Order No. GAX95-25851
Bibliometrics
Skip Abstract Section
Abstract

Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures and Cache Only Memory Architectures (COMA) are two interesting variations of large scale shared memory architectures that have recently emerged. These architectures have distributed main memory and use directory based cache coherence. Unlike CC-NUMA, data in COMA can automatically migrate and replicate at memory in cache line sized chunks. The performance difference between these architectures is primarily determined by two factors: the relative magnitude of capacity misses versus coherence misses, and the granularity of data partitions in an application. COMA's performance advantage occurs mainly in applications where data accesses by different processors are finely interleaved in memory space and where capacity misses dominate over coherence misses. Because COMA uses a hierarchical directory structure to maintain cache coherence, applications where coherence misses dominate will have better performance on CC-NUMA. COMA-F combines the advantages of both CC-NUMA and COMA by retaining the cache organization of main memory found in COMA but utilizes a non-hierarchical directory structure to minimize the latency penalty of remote memory accesses.Both COMA and COMA-F architectures have an inherent memory overhead when compared to CC-NUMA. This overhead consists of physical memory required to support the cache organization of memory as well as reserved memory that must be left unallocated by the operating system to facilitate data reshuffling and data replication. Data reshuffling occurs when space needs to be allocated to store a remote memory line in the local memory. Simulation data show that the frequency of reshuffling is sensitive to the allocation policy and associativity of the memory but is relatively unaffected by the block size chosen. Simulation data also show that data replication in the memory caches is important for good performance, but most gains can be achieved through replication in the processor caches. By relaxing the subset property for shared data between the processor caches and memory caches, data replication in the processor caches can be supported without the corresponding memory overhead of supporting data replication in the memory caches.

Contributors
  • Stanford University

Index Terms

  1. COMA-F: a non-hierarchical cache only memory architecture

    Recommendations