We advocate the concept of multilevel caching for the design of high performance cache systems. We suggest that a multilevel inclusion property be imposed in multilevel cache hierarchies to simplify I/O and cache coherency. We give some necessary and sufficient conditions for imposing the inclusion property for fully- and set-associative caches which allow different block sizes at different levels of the hierachy. Our simulation results show that imposing the inclusion property greatly reduces the cache coherence disturbance to first-level caches.
We examine three multiprocessor structures with a two-level cache hierarchy and discuss the feasibility of imposing the inclusion property in these structures. We explore in detail one particular structure, namely a shared-bus organization with a two-level virtual-real cache hierarchy. We show how the second-level cache can be easily extended to solve the synonym problem resulting from the use of a virtually-addressed cache at the first level. We also propose solutions to context switching overhead and cache coherence problems in the context of a two-level virtual-real cache hierarchy. Our simulation results show that this organization has a performance advantage over a hierachy of physically-addressed caches in a multiprocessor environment.
Finally, we propose improvements to current trace-driven cache simulations to make them faster and more economical. We attack the large time and space demands of cache simulation in two ways. First, we reduce the program traces to the extent that exact performance can still be obtained from the reduced traces. Second, we devise an algorithm that can produce performance results for a variety of metrics (hit ratio, write-back counts, bus traffic) for a large number of set-associative write-back caches in just a single simulation run. The trace reduction and the efficient simulation techniques are extended to parallel multiprocessor cache simulations. Our simulation results show that our approach substantially reduces the disk space needed to store the program traces and can dramatically speedup cache simulations and still produce the exact results.
Cited By
- Wu Y and Muntz R (1995). Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches, IEEE Transactions on Parallel and Distributed Systems, 6:9, (930-942), Online publication date: 1-Sep-1995.
- Chame J and Dubois M Cache inclusion and processor sampling in multiprocessor simulations Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, (36-47)
- Chame J and Dubois M (1993). Cache inclusion and processor sampling in multiprocessor simulations, ACM SIGMETRICS Performance Evaluation Review, 21:1, (36-47), Online publication date: 1-Jun-1993.
- Wang W and Baer J Efficient trace-driven simulation method for cache performance analysis Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, (27-36)
- Wang W and Baer J (2019). Efficient trace-driven simulation method for cache performance analysis, ACM SIGMETRICS Performance Evaluation Review, 18:1, (27-36), Online publication date: 1-Apr-1990.
Recommendations
Temporal-based multilevel correlating inclusive cache replacement
Inclusive caches have been widely used in Chip Multiprocessors (CMPs) to simplify cache coherence. However, they have poor performance compared with noninclusive caches not only because of the limited capacity of the entire cache hierarchy but also due ...
Performance evaluation of exclusive cache hierarchies
ISPASS '04: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and SoftwareMemory hierarchy performance, specifically cache memory capacity, is a constraining factor in the performance of modern computers. This paper presents the results of two-level cache memory simulations and examines the impact of exclusive caching on ...