ABSTRACT
In order to evaluate software performance and find regressions, many developers use automated performance tests. However, the test results often contain a certain amount of noise that is not caused by actual performance changes in the programs. They are instead caused by external factors like operating system decisions or unexpected non-determinisms inside the programs. This makes interpreting the test results difficult since results that differ from previous results cannot easily be attributed to either genuine changes or noise. In this paper we present an analysis of a subset of the various factors that are likely to contribute to this noise using the Mozilla Firefox browser as an example. In addition we present a statistical technique for identifying outliers in Mozilla's automatic testing framework. Our results show that a significant amount of noise is caused by memory randomization and other external factors, that there is variance in Firefox internals that does not seem to be correlated with test result variance, and that our suggested statistical forecasting technique can give more reliable detection of genuine performance changes than the one currently in use by Mozilla.
- Alameldeen, A. R. & Wood, D. A. (2003), Variability in architectural simulations of multi-threaded workloads, in 'High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. The Ninth International Symposium on', pp. 7--18. Google ScholarDigital Library
- Brown, M. B. & Forsythe, A. B. (1974), 'Robust tests for the equality of variances', Journal of the American Statistical Association 69(346), 364--367.Google ScholarCross Ref
- Drepper, U. (2007), 'What every programmer should know about memory', http://people.redhat.com/drepper/cpumemory.pdf {22 April 2012}.Google Scholar
- Fowler, M. (2006), 'Continuous integration', http://www.martinfowler.com/articles/continuousIntegration.html {22 April 2012}.Google Scholar
- Georges, A., Buytaert, D. & Eeckhout, L. (2007), Statistically rigorous java performance evaluation, in 'Proceedings of the 22nd annual ACM SIGPLAN conference on Object oriented programming systems and applications - OOPSLA '07', Montreal, Quebec, Canada, p. 57. Google ScholarDigital Library
- Gu, D., Verbrugge, C. & Gagnon, E. (2004), 'Code layout as a source of noise in JVM performance', In Component and Middleware Performance Workshop, OOPSLA.Google Scholar
- Holt, C. C. (1957), 'Forecasting seasonals and trends by exponentially weighted moving averages', International Journal of Forecasting 20(1), 5--10.Google ScholarCross Ref
- Kalibera, T., Bulej, L. & Tuma, P. (2005), 'Benchmark precision and random initial state', Proceedings of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunications Systems pp. 853--862.Google Scholar
- Larres, J., Potanin, A. & Hirose, Y. (2012), A study of performance variations in the mozilla firefox web browser, Technical Report 12-14, Victoria University of Wellington.Google Scholar
- Levene, H. (1960), Robust tests for equality of variances, in I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow & H. B. Mann, eds, 'Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling', Stanford University Press, pp. 278--292.Google Scholar
- Mytkowicz, T., Diwan, A., Hauswirth, M. & Sweeney, P. F. (2009), Producing wrong data without doing anything obviously wrong!, in 'Proceeding of the 14th international conference on Architectural support for programming languages and operating systems', ACM, Washington, DC, USA, pp. 265--276. Google ScholarDigital Library
- O'Callahan, R. (2010), 'Private communication'.Google Scholar
- R Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Google Scholar
- Rodgers, J. L. & Nicewander, W. A. (1988), 'Thirteen ways to look at the correlation coefficient', The American Statistician 42(1), 59--66.Google ScholarCross Ref
- Shacham, H., Page, M., Pfaff, B., Goh, E., Modadugu, N. & Boneh, D. (2004), On the effectiveness of address-space randomization, in 'Proceedings of the 11th ACM conference on Computer and communications security', CCS '04, ACM, Washington DC, USA, p. 298--307. Google ScholarDigital Library
- Tsafrir, D., Ouaknine, K. & Feitelson, D. G. (2007), Reducing performance evaluation sensitivity and variability by input shaking, in 'Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007. MASCOTS '07. 15th International Symposium on', pp. 231--237. Google ScholarDigital Library
- Yar, M. & Chatfield, C. (1990), 'Prediction intervals for the Holt-Winters forecasting procedure', International Journal of Forecasting 6(1), 127--137.Google ScholarCross Ref
Recommendations
A study of tabbed browsing among mozilla firefox users
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsWe present a study which investigated how and why users of Mozilla Firefox use multiple tabs and windows during web browsing. The detailed web browsing usage of 21 participants was logged over a period of 13 to 21 days each, and was supplemented by ...
The visible Web browser
As an aid to the study of the World-Wide Web, we have developed a software application that allows a user to observe the messages passed between a Web browser and a Web server. The application is based on the Mozilla Web Browser, and displays the HTTP ...
Comments