ABSTRACT
This paper presents a design space exploration of a selective load value prediction scheme suitable for energy-aware Simultaneous Multi-Threaded (SMT) architectures. A load value predictor is an architectural enhancement which speculates over the results of a micro-processor load instruction to speed-up the execution of the following instructions. The proposed architectural enhancement differs from a classic predictor due to an improved selection scheme that allows to activate the predictor only when a miss occurs in the first level of cache. We analyze the effectiveness of the selective predictor in terms of overall energy reduction and performance improvement. To this end, we show how the proposed predictor can produce benefits (in terms of overall cost) when the cache size of the SMT architecture is reduced and we compare it with a classic non-selective load value prediction scheme. The experimental results have been gathered with a state-of-the-art SMT simulator running the SPEC2000 benchmark suite, both in SMT and non-SMT mode.
- D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 2nd edition, 1996. Google ScholarDigital Library
- Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. Value locality and load value prediction. In ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pages 138--147, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
- Toshinori Sato and Itsujiro Arita. Reducing energy consumption via low-cost value prediction. In PATMOS '02: Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation, pages 380--389, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
- Ravi Bhargava and Lizy K. John. Latency and energy aware value prediction for high-frequency processors. In ICS '02: Proceedings of the 16th international conference on Supercomputing, pages 45--56, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- L. N. Vintan, A. Florea, and A. Gellert. Focalising dynamic value prediction to cpu's context. IEE Proceedings - Computers and Digital Techniques, 152(4):473--481, 2005.Google Scholar
- Brad Calder, Glenn Reinman, and Dean M. Tullsen. Selective value prediction. In ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pages 64--74, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
- Luis Ceze, Karin Strauss, James Tuck, Josep Torrellas, and Jose Renau. Cava: Using checkpoint-assisted value prediction to hide 12 misses. ACM Transactions on Architecture and Code Optimization, (3), 2006. Google ScholarDigital Library
- Huiyang Zhou and Thomas M. Conte. Enhancing memory level parallelism via recovery-free value prediction. In Proceedings of the 17th International Conference on Supercomputing, pages 326--335, 2003. Google ScholarDigital Library
- Nathan Tuck and Dean M. Tullsen. Multithreaded value prediction. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 5--15, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- Arpad Gellert, Adrian Florea, and Lucian N. Vintan. Exploiting selective instruction reuse and value prediction in a superscalar architecture. Journal of Systems Architecture, 55(3):188--195, 2009. Google ScholarDigital Library
- S. Wilton and N. Jouppi. CACTI: An Enhanced Cache Access and Cycle Time Model. volume 31, pages 677--688, 1996.Google Scholar
- J. Sharkey, D. Ponomarev, and K. Ghose. M-sim: a flexible, multi-threaded architectural simulation environment. Technical Report CS-TR-05-DP01, Department of Computer Science, State University of New York at Binghamton, 2005.Google Scholar
- David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proceedings ISCA 2000: International Symposium on Computer Architecture, pages 83--94, 2000. Google ScholarDigital Library
- Doug Burger, Todd M. Austin, and Steve Bennett. Evaluating future microprocessors: The simplescalar tool set. Technical Report CS-TR-1996-1308, University of Wisconsin, 1996.Google Scholar
- John L. Henning. Spec cpu2000: Measuring cpu performance in the new millennium. Computer, 33(7):28--35, 2000. Google ScholarDigital Library
Recommendations
Exploiting selective instruction reuse and value prediction in a superscalar architecture
In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches ...
Boosting SMT Performance by Speculation Control
IPDPS '01: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS'01) - Volume 1Simultaneous Multithreading (SMT) is a technique that permits multiple threads to execute in parallel within a single processor. Usually, an SMT processor uses shared instruction queues to collect instructions from the different threads. Hence, an SMT ...
Exploiting speculative value reuse using value prediction
Data dependencies between instructions greatly impede instruction-level parallelism. Recently two hardware techniques --- Value Prediction and Value Reuse --- have been proposed to overcome the limits imposed by data dependencies. We introduce a new ...
Comments