Tolerating latency through software-controlled data prefetching

November 1995

Author:
Todd Carl Mowry
Stanford Univ.

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:UMI Order No. GAX94-29983

Bibliometrics

Abstract

The large latency of memory accesses in modern computer systems is a key obstacle to achieving high processor utilization. Furthermore, the technology trends indicate that this gap between processor and memory speeds is likely to increase in the future. While increased latency affects all computer systems, the problem is magnified in large-scale shared-memory multiprocessors, where physical dimensions cause latency to be an inherent problem. To cope with the memory latency problem, the basic solution that nearly all computer systems rely on is their cache hierarchy. While caches are useful, they are not a panacea.

Software-controlled prefetching is a technique for tolerating memory latency by explicitly executing prefetch instructions to move data close to the processor before it is actually needed. This technique is attractive because it can hide both read and write latency within a single thread of execution while requiring relatively little hardware support. Software-controlled prefetching, however, presents two major challenges. First, some sophistication is required on the part of either the programmer, runtime system, or (preferably) the compiler to insert prefetches into the code. Second, care must be taken that the overheads of prefetching, which include additional instructions and increased memory queueing delays, do not outweigh the benefits.

This dissertation proposes and evaluates a new compiler algorithm for inserting prefetches into code. The proposed algorithm attempts to minimize overheads by only issuing prefetches for references that are predicted to suffer cache misses. The algorithm can prefetch both dense-matrix and sparse-matrix codes, thus covering a large fraction of scientific applications. It also works for both uniprocessor and large-scale shared-memory multiprocessor architectures. We have implemented our algorithm in the SUIF (Stanford University Intermediate Form) optimizing compiler. The results of our detailed architectural simulations demonstrate that the speed of some applications can be improved by as much as a factor of two, both on uniprocessor and multiprocessor systems. This dissertation also compares software-controlled prefetching with other latency-hiding techniques (e.g., locality optimizations, relaxed consistency models, and multithreading), and investigates the architectural support necessary to make prefetching effective.

Cited By

Contributors

Todd Carl Mowry
Carnegie Mellon University
- Publication Years1991 - 2023
- Publication counts80
- Citation count8,148
- Available for Download87
- Downloads (cumulative)77,677
- Downloads (12 months)6,344
- Downloads (6 weeks)986
- Average Downloads per Article893
- Average Citation per Article102
View Full Profile

Index Terms

Tolerating latency through software-controlled data prefetching
1. Information systems
  1. Information storage systems
    1. Storage management
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management
        Memory management
    2. Extra-functional properties
      1. Software fault tolerance

Recommendations

Tolerating latency in multiprocessors through compiler-inserted prefetching

The large latency of memory accesses in large-scale shared-memory multiprocessors is a key obstacle to achieving high processor utilization. Software-controlled prefetching is a technique for tolerating memory latency by explicitly executing ...
Read More
Tolerating Latency Through Software-Controlled Data Prefetching
Read More
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Read More

Comments

Browse Theses

Sections

Cited By

Index Terms

Tolerating latency in multiprocessors through compiler-inserted prefetching

Tolerating Latency Through Software-Controlled Data Prefetching

Increasing hardware data prefetching performance using the second-level cache

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Tolerating latency in multiprocessors through compiler-inserted prefetching

Tolerating Latency Through Software-Controlled Data Prefetching

Increasing hardware data prefetching performance using the second-level cache