ABSTRACT
In this paper we introduce a new classification of misses in shared-memory multiprocessors based on interprocessor communication. We identify the set of essential misses, i.e., the smallest set of misses necessary for correct execution. Essential misses include cold misses and true sharing misses. All other misses are useless misses and can be ignored without affecting the correctness of program execution. Based on the new classification we compare the effectiveness of five different protocols which delay and combine invalidations leading to useless misses. In cache-based systems the protocols are very effective and have miss rates close to the essential miss rate. In virtual shared memory systems the techniques are also effective but leave room for improvements.
- 1.Bennett, j.K., Carter, J.B., and Zwaenepoel, W., "Adaptive Software Cache Management for Distributed Shared Memory Architectures," Proc. of the 17th Ann. Int. Symp. on Comp. Arch., pp. 125-134, Jun. 1990. Google ScholarDigital Library
- 2.Borrmann, L., and Herdieckerhoff, M., "A Coherency Model for Vtrtual Shared Memory," Proc. of Int. Conf. on Parallel Proc., Vol. 2, pp.252-257, Jun. 1990.Google Scholar
- 3.Boyle, J., et al., "Portable Programs for Parallel Processors". Holt, Rinehart, and Winston Inc.,1987. Google ScholarDigital Library
- 4.Brorsson, M., Dahlgren, F., Nilsson, H., and StenstrSm, P.,"The CacheMire Test Bench ~ A Flexible and Effective Approach for Simulation of Multiprocessors," Proc. of the 26th Annual Simulation Symposium, March 1993.Google Scholar
- 5.Censier, L.M., and Feautrier, P., "A New Solution to Coherence Problems in Multicache Systems," IEEE Trans. on Comp., Vol. C-27, No. 12, pp. 1112-1118, De~. 1978.Google Scholar
- 6.Chen, Y-S, and Dubois, M., "Cache Protocols with Partial Block Invalidations," Int. Syrup. on Parallel Proc., Apr. 1993.Google Scholar
- 7.Dubnicki, C., and LeBlanc, TJ.,"Adjustable Block Size Coherent Caches," Proc. of the 19th Ann. Int. Syrup. on Comp. Arch., pp. 170-180, May 1991. Google ScholarDigital Library
- 8.Dubois, M., Barroso, L., Wang, J.C., and Chen, Y.S., "Delayed Consistency and its Effects on the Miss Rate of Parallel Programs," Supercomputing'91, pp. 197-206, Nov. 1991. Google ScholarDigital Library
- 9.Dubois, M., and Schettrich, C., "Memory Access Dependencies in Shared Memory Multiprocessors," IEEE Trans. on Soft. Eng., 16(6), pp. 660-674, Jun. 1990. Google ScholarDigital Library
- 10.Dubois, M., Skeppstedt, J., Ricciulli, L., Ramamurthy, K., and Stenstrt~m, P., "The Detection and Elimination of Useless Misses in Multiprocessors," USC Tech. Rep. No. CENG-93-2, jan.1993.Google Scholar
- 11.Eggers, S. J., and Jeremiassen, T. E., "Eliminating False Sharing," Proc. of the 1991 Int. Conf. on Par. Proc., pp. 1-377-1-381, Aug. 1991. Also published as TR 90-12- 01, Univ. of Washington, Dept. of Comp. So. and Eng, Seattle, Washington.Google Scholar
- 12.Ekstrand, M., "Parallel Applications for Architectural Evaluations of Shared-Memory Multiprocessors." Master's thesis, Dept. of Comp. Eng., Lund Univ., Sweden, Feb. 1993.Google Scholar
- 13.Lenoski, D., Laudon, J.P., Gharachorloo, K., Gupta, A., and Hennessy, J.L.,"The Directory-based Cache Coherence Protocol for the DASH Multiprocessor," Proc. of the 17th Ann. Int. Syrup. on Comp. Arch., pp. 148-159, Jun. 1990. Google ScholarDigital Library
- 14.Singh, J. P., Weber, W-D, and Gupta., A."SPLASH: Stanford Parallel Applications for Shared-Memory". Computer Architecture News, 20( 1):5-44, March 1992. Google ScholarDigital Library
- 15.StenstrSm, P., "A Survey of Cache Coherence Schemes for Multiprocessors," IEEE Computer, Vol. 23, No. 6, pp. 12-24, Jun. 1990. Google ScholarDigital Library
- 16.Torrellas, J., Lam, M.S., and Hennessy, J.L., "Shared Data Placement Optimizations to Reduce Multiprocessor Cache Misses," Proc. of the 1990 Int. Conf. on Parallel Proc., pp. 266-270, Aug 1990. Also published as "Measurement, Analysis, and Improvement of the Cache Behavior of Shared Data in Cache Coherent Multiprocessors" Tech. Rep. CSL-TR-90-412, Stanford University, Stanford, CA, Feb. 1990.Google Scholar
Index Terms
- The detection and elimination of useless misses in multiprocessors
Recommendations
The detection and elimination of useless misses in multiprocessors
Special Issue: Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93)In this paper we introduce a new classification of misses in shared-memory multiprocessors based on interprocessor communication. We identify the set of essential misses, i.e., the smallest set of misses necessary for correct execution. Essential misses ...
Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration
EUROMICRO-PDP'02: Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processingRecent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller and the network interface. In this work we exploit such integration scale, presenting a new three-level ...
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Comments