Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures and Cache Only Memory Architectures (COMA) are two interesting variations of large scale shared memory architectures that have recently emerged. These architectures have distributed main memory and use directory based cache coherence. Unlike CC-NUMA, data in COMA can automatically migrate and replicate at memory in cache line sized chunks. The performance difference between these architectures is primarily determined by two factors: the relative magnitude of capacity misses versus coherence misses, and the granularity of data partitions in an application. COMA's performance advantage occurs mainly in applications where data accesses by different processors are finely interleaved in memory space and where capacity misses dominate over coherence misses. Because COMA uses a hierarchical directory structure to maintain cache coherence, applications where coherence misses dominate will have better performance on CC-NUMA. COMA-F combines the advantages of both CC-NUMA and COMA by retaining the cache organization of main memory found in COMA but utilizes a non-hierarchical directory structure to minimize the latency penalty of remote memory accesses.Both COMA and COMA-F architectures have an inherent memory overhead when compared to CC-NUMA. This overhead consists of physical memory required to support the cache organization of memory as well as reserved memory that must be left unallocated by the operating system to facilitate data reshuffling and data replication. Data reshuffling occurs when space needs to be allocated to store a remote memory line in the local memory. Simulation data show that the frequency of reshuffling is sensitive to the allocation policy and associativity of the memory but is relatively unaffected by the block size chosen. Simulation data also show that data replication in the memory caches is important for good performance, but most gains can be achieved through replication in the processor caches. By relaxing the subset property for shared data between the processor caches and memory caches, data replication in the processor caches can be supported without the corresponding memory overhead of supporting data replication in the memory caches.
Cited By
- Qiu X and Dubois M (2005). Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 16:7, (612-623), Online publication date: 1-Jul-2005.
- Buenabad-Chávez J and Domínguez-Domínguez S The data diffusion space for parallel computing in clusters Proceedings of the 11th international Euro-Par conference on Parallel Processing, (61-71)
- Awhad V and Wallace C A unified formal specification and analysis of the new java memory models Proceedings of the abstract state machines 10th international conference on Advances in theory and practice, (166-185)
- Qiu X and Dubois M Options for dynamic address translation in COMAs Proceedings of the 25th annual international symposium on Computer architecture, (214-225)
- Qiu X and Dubois M (1998). Options for dynamic address translation in COMAs, ACM SIGARCH Computer Architecture News, 26:3, (214-225), Online publication date: 1-Jun-1998.
- Moga A, Dubois M and Gefflaut A Hardware Versus Software Implementation of COMA Proceedings of the international Conference on Parallel Processing, (248-256)
- Landin A and Karlgren M A Study of the Efficiency of Shared Attraction Memories in Cluster-Based COMA Multiprocessors Proceedings of the 11th International Symposium on Parallel Processing, (1-7)
- Landin A and Dahlgren F Bus-based COMA-reducing traffic in shared-bus multiprocessors Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Index Terms
- COMA-F: a non-hierarchical cache only memory architecture
Recommendations
COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)Due to the increasing number of their components, Scalable Shared Memory Multiprocessors (SSMMs) have a very high probability of experiencing failures. Tolerating node failures therefore becomes very important for these architectures particularly if ...
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92: Proceedings of the 19th annual international symposium on Computer architectureTwo interesting variations of large-scale shared-memory machines that have recently emerged are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). They both have distributed main memory and use ...