Abstract
Virtual Machines (VMs) with Just-In-Time (JIT) compilers are traditionally thought to execute programs in two phases: the initial warmup phase determines which parts of a program would most benefit from dynamic compilation, before JIT compiling those parts into machine code; subsequently the program is said to be at a steady state of peak performance. Measurement methodologies almost always discard data collected during the warmup phase such that reported measurements focus entirely on peak performance. We introduce a fully automated statistical approach, based on changepoint analysis, which allows us to determine if a program has reached a steady state and, if so, whether that represents peak performance or not. Using this, we show that even when run in the most controlled of circumstances, small, deterministic, widely studied microbenchmarks often fail to reach a steady state of peak performance on a variety of common VMs. Repeating our experiment on 3 different machines, we found that at most 43.5% of <VM, Benchmark> pairs consistently reach a steady state of peak performance.
Supplemental Material
Available for Download
- Jaromir Antoch, Marie Huskova, and Zuzana Prášková. 1997. Effect of dependence on statistics for determination of change. Journal of Statistical Planning and Inference 60 (May 1997), 291–310. Google ScholarCross Ref
- Doug Bagley, Brent Fulgham, and Isaac Gouy. 2004. The Computer Language Benchmarks Game. http://benchmarksgame. alioth.debian.org/ . (2004). Accessed: 2017-09-01.Google Scholar
- Edd Barrett, Carl Friedrich Bolz, and Laurence Tratt. 2015. Approaches to Interpreter Composition. COMLAN 44, C (March 2015). Google ScholarDigital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA. 169–190.Google Scholar
- Carl Friedrich Bolz and Laurence Tratt. 2015. The Impact of Meta-Tracing on VM Design and Implementation. SCICO 98, 3 (Feb. 2015), 408–421. Google ScholarDigital Library
- James Charles, Preet Jassi, Ananth Narayan S, Abbas Sadat, and Alexandra Fedorova. 2009. Evaluation of the Intel Core i7 Turbo Boost Feature. In IISWC.Google Scholar
- Charlie Curtsinger and Emery D. Berger. 2013. Stabilizer: Statistically sound performance evaluation. In ASPLOS.Google Scholar
- Idris Eckley, Paul Fearnhead, and Rebecca Killick. 2011. Analysis of Changepoint Models. In Bayesian Time Series Models, D. Barber, T. Cemgil, and S. Chiappa (Eds.). Google ScholarCross Ref
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. SIGPLAN Not. 42, 10 (Oct. 2007), 57–76. Google ScholarDigital Library
- Joseph Yossi Gil, Keren Lenz, and Yuval Shimron. 2011. A Microbenchmark Case Study and Lessons Learned. In VMIL. Google ScholarDigital Library
- Google. 2012. Octane benchmark suite. https://developers.google.com/octane/ . (2012). Accessed: 2017-09-01.Google Scholar
- Intel. 2017. Intel 64 and IA-32 Architectures Software Developer’s Manual: P-State Hardware Coordination.Google Scholar
- Tomas Kalibera, Lubomir Bulej, and Petr Tuma. 2005. Benchmark precision and random initial state. In SPECTS.Google Scholar
- Tomas Kalibera and Richard Jones. 2012. Quantifying performance changes with effect size confidence intervals. Technical Report 4-12. University of Kent.Google Scholar
- Tomas Kalibera and Richard Jones. 2013. Rigorous Benchmarking in Reasonable Time. In ISMM. 63–74. Google ScholarDigital Library
- Rebecca Killick and Idris Eckley. 2014. changepoint: An R Package for Changepoint Analysis. J. Stat. Soft. 58, 1 (May 2014), 1–19.Google ScholarCross Ref
- Rebecca Killick, Paul Fearnhead, and Idris Eckley. 2012. Optimal Detection of Changepoints With a Linear Computational Cost. J. Am. Stat. Assoc. 107, 500 (Dec. 2012), 1590–1598. Google ScholarCross Ref
- Linux. 2013. NO_HZ: Reducing Scheduling-Clock Ticks, Linux Kernel Documentation. https://www.kernel.org/-doc/Documentation/timers/NO_HZ.txt . (2013). Accessed: 2017-09-01.Google Scholar
- Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. Producing Wrong Data Without Doing Anything Obviously Wrong!. In ASPLKS. 265–276.Google Scholar
- Paruj Ratanaworabhan, Benjamin Livshits, David Simmons, and Benjamin Zorn. 2009. JSMeter: Characterizing Real-World Behavior of JavaScript Programs. Technical Report MSR-TR-2009-173. Microsoft Research.Google Scholar
- Chris Seaton. 2015. Specialising Dynamic Techniques for Implementing the Ruby Programming Language. Ph.D. Dissertation. University of Manchester.Google Scholar
- Cristina P. Sison and Joseph Glaz. 1995. Simultaneous confidence intervals and sample size determination for multinomial proportions. J. ASA 90, 429 (March 1995), 366–369. Google ScholarCross Ref
- Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-less System Calls. In OSDI. 1–8.Google Scholar
- John Tukey. 1977. Exploratory Data Analysis.Google Scholar
Index Terms
- Virtual machine warmup blows hot and cold
Recommendations
A feather-weight virtual machine for windows applications
VEE '06: Proceedings of the 2nd international conference on Virtual execution environmentsMany fault-tolerant and intrusion-tolerant systems require the ability to execute unsafe programs in a realistic environment without leaving permanent damages. Virtual machine technology meets this requirement perfectly because it provides an execution ...
Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud Computing Environments
CLOUD '11: Proceedings of the 2011 IEEE 4th International Conference on Cloud ComputingVirtualization technology is currently becoming increasingly popular and valuable in cloud computing environments due to the benefits of server consolidation, live migration, and resource isolation. Live migration of virtual machines can be used to ...
Resource availability based performance benchmarking of virtual machine migrations
ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance EngineeringVirtual machine migration enables load balancing, hot spot mitigation and server consolidation in virtualized environments. Live VM migration can be of two types - adaptive, in which the rate of page transfer adapts to virtual machine behaviour (mainly ...
Comments