research-article

Open Access

Virtual machine warmup blows hot and cold

Authors:
Edd Barrett

King's College London, UK

King's College London, UK
View Profile

,
Carl Friedrich Bolz-Tereick

King's College London, UK

King's College London, UK
View Profile

,
Rebecca Killick

Lancaster University, UK

Lancaster University, UK
View Profile

,
Sarah Mount

King's College London, UK

King's College London, UK
View Profile

,
Laurence Tratt

King's College London, UK

King's College London, UK
View Profile

Proceedings of the ACM on Programming Languages Volume 1 Issue OOPSLAArticle No.: 52pp 1–27https://doi.org/10.1145/3133876

Published:12 October 2017Publication History

Proceedings of the ACM on Programming Languages

Abstract

Virtual Machines (VMs) with Just-In-Time (JIT) compilers are traditionally thought to execute programs in two phases: the initial warmup phase determines which parts of a program would most benefit from dynamic compilation, before JIT compiling those parts into machine code; subsequently the program is said to be at a steady state of peak performance. Measurement methodologies almost always discard data collected during the warmup phase such that reported measurements focus entirely on peak performance. We introduce a fully automated statistical approach, based on changepoint analysis, which allows us to determine if a program has reached a steady state and, if so, whether that represents peak performance or not. Using this, we show that even when run in the most controlled of circumstances, small, deterministic, widely studied microbenchmarks often fail to reach a steady state of peak performance on a variety of common VMs. Repeating our experiment on 3 different machines, we found that at most 43.5% of <VM, Benchmark> pairs consistently reach a steady state of peak performance.

Supplemental Material

Available for Download

zip

Auxiliary Archive (11.2 GB)

References

Jaromir Antoch, Marie Huskova, and Zuzana Prášková. 1997. Effect of dependence on statistics for determination of change. Journal of Statistical Planning and Inference 60 (May 1997), 291–310. Google ScholarCross Ref
Doug Bagley, Brent Fulgham, and Isaac Gouy. 2004. The Computer Language Benchmarks Game. http://benchmarksgame. alioth.debian.org/ . (2004). Accessed: 2017-09-01.Google Scholar
Edd Barrett, Carl Friedrich Bolz, and Laurence Tratt. 2015. Approaches to Interpreter Composition. COMLAN 44, C (March 2015). Google ScholarDigital Library
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA. 169–190.Google Scholar
Carl Friedrich Bolz and Laurence Tratt. 2015. The Impact of Meta-Tracing on VM Design and Implementation. SCICO 98, 3 (Feb. 2015), 408–421. Google ScholarDigital Library
James Charles, Preet Jassi, Ananth Narayan S, Abbas Sadat, and Alexandra Fedorova. 2009. Evaluation of the Intel Core i7 Turbo Boost Feature. In IISWC.Google Scholar
Charlie Curtsinger and Emery D. Berger. 2013. Stabilizer: Statistically sound performance evaluation. In ASPLOS.Google Scholar
Idris Eckley, Paul Fearnhead, and Rebecca Killick. 2011. Analysis of Changepoint Models. In Bayesian Time Series Models, D. Barber, T. Cemgil, and S. Chiappa (Eds.). Google ScholarCross Ref
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. SIGPLAN Not. 42, 10 (Oct. 2007), 57–76. Google ScholarDigital Library
Joseph Yossi Gil, Keren Lenz, and Yuval Shimron. 2011. A Microbenchmark Case Study and Lessons Learned. In VMIL. Google ScholarDigital Library
Google. 2012. Octane benchmark suite. https://developers.google.com/octane/ . (2012). Accessed: 2017-09-01.Google Scholar
Intel. 2017. Intel 64 and IA-32 Architectures Software Developer’s Manual: P-State Hardware Coordination.Google Scholar
Tomas Kalibera, Lubomir Bulej, and Petr Tuma. 2005. Benchmark precision and random initial state. In SPECTS.Google Scholar
Tomas Kalibera and Richard Jones. 2012. Quantifying performance changes with effect size confidence intervals. Technical Report 4-12. University of Kent.Google Scholar
Tomas Kalibera and Richard Jones. 2013. Rigorous Benchmarking in Reasonable Time. In ISMM. 63–74. Google ScholarDigital Library
Rebecca Killick and Idris Eckley. 2014. changepoint: An R Package for Changepoint Analysis. J. Stat. Soft. 58, 1 (May 2014), 1–19.Google ScholarCross Ref
Rebecca Killick, Paul Fearnhead, and Idris Eckley. 2012. Optimal Detection of Changepoints With a Linear Computational Cost. J. Am. Stat. Assoc. 107, 500 (Dec. 2012), 1590–1598. Google ScholarCross Ref
Linux. 2013. NO_HZ: Reducing Scheduling-Clock Ticks, Linux Kernel Documentation. https://www.kernel.org/-doc/Documentation/timers/NO_HZ.txt . (2013). Accessed: 2017-09-01.Google Scholar
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. Producing Wrong Data Without Doing Anything Obviously Wrong!. In ASPLKS. 265–276.Google Scholar
Paruj Ratanaworabhan, Benjamin Livshits, David Simmons, and Benjamin Zorn. 2009. JSMeter: Characterizing Real-World Behavior of JavaScript Programs. Technical Report MSR-TR-2009-173. Microsoft Research.Google Scholar
Chris Seaton. 2015. Specialising Dynamic Techniques for Implementing the Ruby Programming Language. Ph.D. Dissertation. University of Manchester.Google Scholar
Cristina P. Sison and Joseph Glaz. 1995. Simultaneous confidence intervals and sample size determination for multinomial proportions. J. ASA 90, 429 (March 1995), 366–369. Google ScholarCross Ref
Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-less System Calls. In OSDI. 1–8.Google Scholar
John Tukey. 1977. Exploratory Data Analysis.Google Scholar

Index Terms

Virtual machine warmup blows hot and cold
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Interpreters
      2. Just-in-time compilers
  2. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

A feather-weight virtual machine for windows applications
VEE '06: Proceedings of the 2nd international conference on Virtual execution environments

Many fault-tolerant and intrusion-tolerant systems require the ability to execute unsafe programs in a realistic environment without leaving permanent damages. Virtual machine technology meets this requirement perfectly because it provides an execution ...
Read More
Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud Computing Environments
CLOUD '11: Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing

Virtualization technology is currently becoming increasingly popular and valuable in cloud computing environments due to the benefits of server consolidation, live migration, and resource isolation. Live migration of virtual machines can be used to ...
Read More
Resource availability based performance benchmarking of virtual machine migrations
ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

Virtual machine migration enables load balancing, hot spot mitigation and server consolidation in virtualized environments. Live VM migration can be of two types - adaptive, in which the rate of page transfer adapts to virtual machine behaviour (mainly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 1, Issue OOPSLA
October 2017
1786 pages
EISSN:2475-1421
DOI:10.1145/3152284
Issue’s Table of Contents

Copyright © 2017 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2017
Published in pacmpl Volume 1, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Functional
Author Tags
JIT
Virtual machine
benchmarking
performance
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 64
  Total Citations
  View Citations
- 2,193
  Total Downloads
- Downloads (Last 12 months)308
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Virtual machine warmup blows hot and cold

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A feather-weight virtual machine for windows applications

Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud Computing Environments

Resource availability based performance benchmarking of virtual machine migrations