skip to main content
Data preload for superscalar and VLIW processors
Publisher:
  • University of Illinois at Urbana-Champaign
  • Champaign, IL
  • United States
Order Number:UMI Order No. GAX94-11585
Bibliometrics
Skip Abstract Section
Abstract

Processor design techniques, such as pipelining, superscalar, and VLIW, have dramatically decreased the average number of clock cycles per instruction. As a result, each execution cycle has become more significant to overall system performance. To maximize the effectiveness of each cycle, one must expose instruction-level parallelism and employ memory latency tolerant techniques. However, without special architecture support, a superscalar compiler cannot effectively accomplish these two tasks in the presence of control and memory access dependences.

Preloading is a class of architectural support which allows memory reads to be performed early in spite of potential violation of control and memory access dependences. With preload support, a superscalar compiler can perform more aggressive code reordering to provide increased tolerance of cache and memory access latencies and increasing instruction-level parallelism. This thesis discusses the architectural features and compiler support required to effectively utilize preload instructions to increase the overall system performance.

The first hardware support is preload register update, a data preload support for load scheduling to reduce first-level cache hit latency. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.

The second hardware support is memory conflict buffer. Memory conflict buffer extends preload register update support by allowing uses of the load to move above ambiguous stores. Correct program execution is maintained using the memory conflict buffer and repair code provided by the compiler. With this addition, substantial speedup over an aggressive code scheduling model is achieved for a set of control intensive nonnumerical programs.

The last hardware support is preload buffer. Large data sets and slow memory sub-systems result in unacceptable performance for numerical programs. Preload buffer allows performing loads early while eliminating problems with cache pollution and extended register live ranges. Adding the prestore buffer allows loads to be scheduled in the presence of ambiguous stores. Preload buffer support in addition to cache prefetching support is shown to achieve better performance than cache prefetching alone for a set of benchmarks. In all cases, preloading decreases the bus traffic and reduces the miss rate when compared with no prefetching or cache prefetching.

Cited By

  1. Barua R, Lee W, Amarasinghe S and Agarawal A (2001). Compiler Support for Scalable and Efficient Memory Systems, IEEE Transactions on Computers, 50:11, (1234-1247), Online publication date: 1-Nov-2001.
  2. ACM
    Le B (2019). An out-of-order execution technique for runtime binary translators, ACM SIGPLAN Notices, 33:11, (151-158), Online publication date: 1-Nov-1998.
  3. ACM
    Le B An out-of-order execution technique for runtime binary translators Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, (151-158)
  4. ACM
    Le B (1998). An out-of-order execution technique for runtime binary translators, ACM SIGOPS Operating Systems Review, 32:5, (151-158), Online publication date: 1-Dec-1998.
  5. Conte T, Menezes K and Hirsch M Accurate and practical profile-driven compilation using the profile buffer Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, (36-45)
  6. ACM
    Conte T, Patel B and Cox J Using branch handling hardware to support profile-driven optimization Proceedings of the 27th annual international symposium on Microarchitecture, (12-21)
  7. ACM
    Gallagher D, Chen W, Mahlke S, Gyllenhaal J and Hwu W (1994). Dynamic memory disambiguation using the memory conflict buffer, ACM SIGPLAN Notices, 29:11, (183-193), Online publication date: 1-Nov-1994.
  8. ACM
    Gallagher D, Chen W, Mahlke S, Gyllenhaal J and Hwu W Dynamic memory disambiguation using the memory conflict buffer Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (183-193)
  9. ACM
    Gallagher D, Chen W, Mahlke S, Gyllenhaal J and Hwu W (1994). Dynamic memory disambiguation using the memory conflict buffer, ACM SIGOPS Operating Systems Review, 28:5, (183-193), Online publication date: 1-Dec-1994.
Contributors
  • University of Illinois Urbana-Champaign

Recommendations