Multiprocessor cache memory performance

Multiprocessor cache memory performance: characterization and optimization

January 1992

Author:
Josep Torrellas

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:UMI Order No. GAX93-02325

Bibliometrics

Abstract

Good cache memory performance is critical to achieving high CPU utilization in shared-memory multiprocessors. Reliably characterizing the performance of multiprocessor caches is hard, however, for it often requires experimental measurements on real machines across several workload domains. In this dissertation, we characterize some of the major sources of cache performance degradation, namely data sharing, operating system activity, and poor reuse of cache state in multiprogrammed workloads. We use data from a hardware performance monitor in a high-performance 4-CPU multiprocessor running scientific, engineering, software-development, and database workloads.

While some of the misses on shared data result from the intrinsic inter-CPU communication required by the application, the rest, false sharing misses, are a consequence of the way data sharing interacts with multi-word cache blocks. We separate false sharing misses from the remaining, true sharing misses. We find that, while applications suffer false sharing, their miss rate is also affected by the poor spatial locality of true sharing. To reduce the miss rate, we then evaluate optimizations of the layout of shared data in cache blocks.

We discover three major sources of operating system misses: instruction fetches, block operations (copy and clear), and process migration. Instruction misses are more commonplace than suspected. They are often caused by operating system self-interference in the cache. Hence, we propose optimizing the layout of the operating system code and consider increasing the cache associativity. The effect of misses in block operations can be partially eliminated by using special support for these operations. Finally, process migration misses are a consequence of the poor reuse of cache state in multiprogrammed workloads.

In multiprogrammed workloads, the cache state built up by a process may be lost when the process is preempted, either because intervening processes destroy the state or because the process migrates to another CPU. We evaluate affinity scheduling, a technique that increases the reuse of cache state by encouraging processes to run on the CPUs whose caches keep useful state. We show that affinity scheduling attains most of the increase in cache state reuse possible in the workloads. Overall, affinity scheduling produces moderate speedups at nearly no cost.

Cited By

Contributors

Josep Torrellas
University of Illinois Urbana-Champaign
- Publication Years1988 - 2024
- Publication counts210
- Citation count9,131
- Available for Download189
- Downloads (cumulative)134,007
- Downloads (12 months)18,817
- Downloads (6 weeks)2,523
- Average Downloads per Article709
- Average Citation per Article43
View Full Profile

Index Terms

Multiprocessor cache memory performance: characterization and optimization
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Comments

Browse Theses

Sections

Cited By

Index Terms

Cache memory design and performance issues in shared-memory multiprocessors

Improving memory hierarchy performance with hardware prefetching and cache replacement

An adaptive chip multiprocessor cache hierarchy

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Cache memory design and performance issues in shared-memory multiprocessors

Improving memory hierarchy performance with hardware prefetching and cache replacement

An adaptive chip multiprocessor cache hierarchy