skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Virtual machine warmup blows hot and cold

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Virtual Machines (VMs) with Just-In-Time (JIT) compilers are traditionally thought to execute programs in two phases: the initial warmup phase determines which parts of a program would most benefit from dynamic compilation, before JIT compiling those parts into machine code; subsequently the program is said to be at a steady state of peak performance. Measurement methodologies almost always discard data collected during the warmup phase such that reported measurements focus entirely on peak performance. We introduce a fully automated statistical approach, based on changepoint analysis, which allows us to determine if a program has reached a steady state and, if so, whether that represents peak performance or not. Using this, we show that even when run in the most controlled of circumstances, small, deterministic, widely studied microbenchmarks often fail to reach a steady state of peak performance on a variety of common VMs. Repeating our experiment on 3 different machines, we found that at most 43.5% of <VM, Benchmark> pairs consistently reach a steady state of peak performance.

Skip Supplemental Material Section

Supplemental Material

References

  1. Jaromir Antoch, Marie Huskova, and Zuzana Prášková. 1997. Effect of dependence on statistics for determination of change. Journal of Statistical Planning and Inference 60 (May 1997), 291–310. Google ScholarGoogle ScholarCross RefCross Ref
  2. Doug Bagley, Brent Fulgham, and Isaac Gouy. 2004. The Computer Language Benchmarks Game. http://benchmarksgame. alioth.debian.org/ . (2004). Accessed: 2017-09-01.Google ScholarGoogle Scholar
  3. Edd Barrett, Carl Friedrich Bolz, and Laurence Tratt. 2015. Approaches to Interpreter Composition. COMLAN 44, C (March 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA. 169–190.Google ScholarGoogle Scholar
  5. Carl Friedrich Bolz and Laurence Tratt. 2015. The Impact of Meta-Tracing on VM Design and Implementation. SCICO 98, 3 (Feb. 2015), 408–421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. James Charles, Preet Jassi, Ananth Narayan S, Abbas Sadat, and Alexandra Fedorova. 2009. Evaluation of the Intel Core i7 Turbo Boost Feature. In IISWC.Google ScholarGoogle Scholar
  7. Charlie Curtsinger and Emery D. Berger. 2013. Stabilizer: Statistically sound performance evaluation. In ASPLOS.Google ScholarGoogle Scholar
  8. Idris Eckley, Paul Fearnhead, and Rebecca Killick. 2011. Analysis of Changepoint Models. In Bayesian Time Series Models, D. Barber, T. Cemgil, and S. Chiappa (Eds.). Google ScholarGoogle ScholarCross RefCross Ref
  9. Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. SIGPLAN Not. 42, 10 (Oct. 2007), 57–76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joseph Yossi Gil, Keren Lenz, and Yuval Shimron. 2011. A Microbenchmark Case Study and Lessons Learned. In VMIL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Google. 2012. Octane benchmark suite. https://developers.google.com/octane/ . (2012). Accessed: 2017-09-01.Google ScholarGoogle Scholar
  12. Intel. 2017. Intel 64 and IA-32 Architectures Software Developer’s Manual: P-State Hardware Coordination.Google ScholarGoogle Scholar
  13. Tomas Kalibera, Lubomir Bulej, and Petr Tuma. 2005. Benchmark precision and random initial state. In SPECTS.Google ScholarGoogle Scholar
  14. Tomas Kalibera and Richard Jones. 2012. Quantifying performance changes with effect size confidence intervals. Technical Report 4-12. University of Kent.Google ScholarGoogle Scholar
  15. Tomas Kalibera and Richard Jones. 2013. Rigorous Benchmarking in Reasonable Time. In ISMM. 63–74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rebecca Killick and Idris Eckley. 2014. changepoint: An R Package for Changepoint Analysis. J. Stat. Soft. 58, 1 (May 2014), 1–19.Google ScholarGoogle ScholarCross RefCross Ref
  17. Rebecca Killick, Paul Fearnhead, and Idris Eckley. 2012. Optimal Detection of Changepoints With a Linear Computational Cost. J. Am. Stat. Assoc. 107, 500 (Dec. 2012), 1590–1598. Google ScholarGoogle ScholarCross RefCross Ref
  18. Linux. 2013. NO_HZ: Reducing Scheduling-Clock Ticks, Linux Kernel Documentation. https://www.kernel.org/-doc/Documentation/timers/NO_HZ.txt . (2013). Accessed: 2017-09-01.Google ScholarGoogle Scholar
  19. Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. Producing Wrong Data Without Doing Anything Obviously Wrong!. In ASPLKS. 265–276.Google ScholarGoogle Scholar
  20. Paruj Ratanaworabhan, Benjamin Livshits, David Simmons, and Benjamin Zorn. 2009. JSMeter: Characterizing Real-World Behavior of JavaScript Programs. Technical Report MSR-TR-2009-173. Microsoft Research.Google ScholarGoogle Scholar
  21. Chris Seaton. 2015. Specialising Dynamic Techniques for Implementing the Ruby Programming Language. Ph.D. Dissertation. University of Manchester.Google ScholarGoogle Scholar
  22. Cristina P. Sison and Joseph Glaz. 1995. Simultaneous confidence intervals and sample size determination for multinomial proportions. J. ASA 90, 429 (March 1995), 366–369. Google ScholarGoogle ScholarCross RefCross Ref
  23. Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-less System Calls. In OSDI. 1–8.Google ScholarGoogle Scholar
  24. John Tukey. 1977. Exploratory Data Analysis.Google ScholarGoogle Scholar

Index Terms

  1. Virtual machine warmup blows hot and cold

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader