Article

Predictable performance in SMT processors

Authors:
Francisco J. Cazorla

DAC, UPC, Spain

DAC, UPC, Spain
View Profile

,
Peter M.W. Knijnenburg

Leiden University, The Netherlands

Leiden University, The Netherlands
View Profile

,
Rizos Sakellariou

University of Manchester, United Kingdom

University of Manchester, United Kingdom
View Profile

,
Enrique Fernández

University de Las Palmas de GC, Spain

University de Las Palmas de GC, Spain
View Profile

,
Alex Ramirez

DAC, UPC, Spain

DAC, UPC, Spain
View Profile

,
Mateo Valero

DAC, UPC, Spain

DAC, UPC, Spain
View Profile

CF '04: Proceedings of the 1st conference on Computing frontiersApril 2004Pages 433–443https://doi.org/10.1145/977091.977152

Published:14 April 2004Publication History

CF '04: Proceedings of the 1st conference on Computing frontiers

Pages 433–443

ABSTRACT

Current instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the IPC of a thread depends on the workload it is executed in and on the fetch policy used.From the point of view of the Operating System (OS), it is the job scheduler that determines how jobs are executed. However, when the OS runs on an SMT processor, the job scheduler cannot guarantee execution time constraints of any job due to this performance unpredictability.In this paper we propose a novel kind of collaboration between the OS and the SMT hardware that enables the OS to enforce that a high priority thread runs at a specific fraction of its full speed. We present an extensive evaluation using many different workloads, that shows that this mechanism gives the required performance in more than 97% of all cases considered, and even more than 99% for the less extreme cases. At the same time, our mechanism does not need to trade off predictability against overall throughput, as it maximizes the IPC of the remaining low priority threads, giving 94% on average (and 97.5% on average for the less extreme cases) of the throughput obtained using instruction fetch policies oriented toward throughput maximization, such as icount.

References

D. Alpert. Will microprocessors become simpler? Microprocessor Report, Nov. 2003.Google Scholar
J. Burns and J.-L. Gaudiot. Quantifying the SMT layout overhead-does SMT pull its weight? Proceedings of the 6th Intl. Conference on High Performance Computer Architecture, pages 109--120, Jan. 2000.Google Scholar
J. Burns and J.-L. Gaudiot. SMT layout overhead and scalability. IEEE Transactions on Parallel and Distributed Systems, 13(1):142--155, Feb. 2002. Google ScholarDigital Library
F. J. Cazorla, E. Fernandez, A. Ramirez, and M. Valero. Improving memory latency aware fetch policies for SMT processors. Proceedings of the 5th International Symposium on High Performance Computing, Oct. 2003.Google ScholarCross Ref
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. Proceedings of Design Automation Conference, June 2000.Google Scholar
G. K. Dorai and D. Yeung. Transparent threads: Resource sharing in smt processors for high single-thread performance. Proceedings of the 11th Intl. Conference on Parallel Architectures and Compilation Techniques, pages 30--41, Sept. 2002. Google ScholarDigital Library
A. El-Moursy and D. Albonesi. Front-end policies for improved issue efficiency in SMT processors. Proceedings of the 9th Intl. Conference on High Performance Computer Architecture, Feb. 2003. Google ScholarDigital Library
P. N. Glaskowsky. IBM previews Power5. Microprocessor Report, Sept. 2003.Google Scholar
M. Gulati and N. Bagherzadeh. Performance study of a multithreaded superscalar microprocessor. Proceedings of the 2nd Intl. Conference on High Performance Computer Architecture, pages 291--301, Feb. 1996. Google ScholarDigital Library
S. Hily and A. Seznec. Contention on 2nd level cache may limit the effectiveness of simultaneous multithreading. Technical Report 1086, IRISA, Feb. 1997.Google Scholar
H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa. An elementary processor architecture with simultaneous instruction issuing from multiple threads. Proceedings of the 19th Annual Intl. Symposium on Computer Architecture, pages 136--145, May 1992. Google ScholarDigital Library
R. Jain, C. Hughes, and S. Adve. Soft real-time scheduling on simultaneous multithreaded processors. Proceedings of the 5th International Symposium on Real-Time Systems Symposium, pages 134--145, Dec. 2002. Google ScholarDigital Library
R. Kalla, B. Sinharoy, and J. Tendler. SMT implementation in POWER 5. Hot Chips, 15, Aug. 2003.Google Scholar
P. Knijnenburg, A. Ramirez, J. Larriba, and M. Valero. Branch classification for SMT fetch gating. Proceedings of the 6th Workshop on Multithreaded Execution, Architecture, and Compilation, pages 49--56, 2002.Google Scholar
C. Limousin, J. Sebot, A. Vartanian, and N. Drach-Temam. Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor. Proceedings of the 15th Intl. Conference on Supercomputing, pages 236--245, May 2001. Google ScholarDigital Library
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. Proceedings of the International Symposium on Performance Analysis of Systems and Software, pages 164--171, Nov. 2001.Google Scholar
D. T. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty, J. A. Miller, and M. Upton. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 6(1), Feb. 2002.Google Scholar
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. Proceedings of the 10th Intl. Conference on Parallel Architectures and Compilation Techniques, Sept. 2001. Google ScholarDigital Library
R. Shin, S.-W. Lee, and J. L. Gaudiot. Dynamic scheduling issues in smt architectures. Proceedings of the International Parallel and Distributed Processing Symposium, Apr. 2003. Google ScholarDigital Library
A. Snavely, D. Tullsen, and G. Voelker. Symbiotic job scheduling with priorities for a simultaneous multithreaded processor. Proceedings of the 9th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, pages 234--244, Nov. 2000. Google ScholarDigital Library
D. Tullsen and J. Brown. Handling long-latency loads in a simultaneous multithreaded processor. Proceedings of the 34th Annual ACM/IEEE Intl. Symposium on Microarchitecture, Dec. 2001. Google ScholarDigital Library
D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. Proceedings of the 23th Annual Intl. Symposium on Computer Architecture, pages 191--202, Apr. 1996. Google ScholarDigital Library
D. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. Proceedings of the 22th Annual Intl. Symposium on Computer Architecture, 1995. Google ScholarDigital Library
R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling. Proceedings of the 30th Annual Intl. Symposium on Computer Architecture, pages 84--97, June 2003. Google ScholarDigital Library
W. Yamamoto and M. Nemirovsky. Increasing superscalar performance through multistreaming. Proceedings of the 4th Intl. Conference on Parallel Architectures and Compilation Techniques, pages 49--58, June 1995. Google ScholarDigital Library

Index Terms

Predictable performance in SMT processors
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

Architectural support for real-time task scheduling in SMT processors
CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems

In Simultaneous Multithreaded (SMT) architectures most hardware resources are shared between threads. This provides a good cost/performance trade-off which renders these architectures suitable for use in embedded systems. However, since threads share ...
Read More
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Read More
Predictable Performance in SMT Processors: Synergy between the OS and SMTs

Current Operating Systems (OS) perceive the different contexts of Simultaneous Multithreaded (SMT) processors as multiple independent processing units, although, in reality, threads executed in these units compete for the same hardware resources. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '04: Proceedings of the 1st conference on Computing frontiers
April 2004
522 pages
ISBN:1581137419
DOI:10.1145/977091
General Chair:
Stamatis Vassiliadis
Delft University of Technology, The Netherlands
,
Program Chairs:
Jean-Luc Gaudiot
University of California at Irvine, USA
,
Vincenzo Piuri
University of Milan, Italy
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ILP
SMT
multithreading
operating systems
performance predictability
real time
thread-level parallelism
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate240of680submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 792
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predictable performance in SMT processors

CF '04: Proceedings of the 1st conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Architectural support for real-time task scheduling in SMT processors

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Predictable Performance in SMT Processors: Synergy between the OS and SMTs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Predictable performance in SMT processors

CF '04: Proceedings of the 1st conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Architectural support for real-time task scheduling in SMT processors

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Predictable Performance in SMT Processors: Synergy between the OS and SMTs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media