skip to main content
10.1145/3219819.3219905acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Winner's Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments

Published:19 July 2018Publication History

ABSTRACT

Online controlled experiments, or A/B testing, has been a standard framework adopted by most online product companies to measure the effect of any new change. Companies use various statistical methods including hypothesis testing and statistical inference to quantify the business impact of the changes and make product decisions. Nowadays, experimentation platforms can run as many as hundreds or even more experiments concurrently. When a group of experiments is conducted, usually the ones with significant successful results are chosen to be launched into the product. We are interested in learning the aggregated impact of the launched features. In this paper, we investigate a statistical selection bias in this process and propose a correction method of getting an unbiased estimator. Moreover, we give an implementation example at Airbnb's ERF platform (Experiment Reporting Framework) and discuss the best practices to account for this bias.

Skip Supplemental Material Section

Supplemental Material

lee_curse_estimation.mp4

mp4

343.9 MB

References

  1. Theodore Alfonso Bancroft. 1944. On biases in estimation due to the use of preliminary tests of significance. The Annals of Mathematical Statistics Vol. 15, 2 (June. 1944), 190--204.Google ScholarGoogle ScholarCross RefCross Ref
  2. Theodore Alfonso Bancroft. 1964. Analysis and inference for incompletely specified models involving the use of preliminary test (s) of significance. Biometrics, Vol. 20, 3 (Sept. 1964), 427--442.Google ScholarGoogle ScholarCross RefCross Ref
  3. Edward C Capen, Robert V Clapp, William M Campbell, and others. 1971. Competitive bidding in high-risk situations. Journal of petroleum technology Vol. 23, 06 (June. 1971), 641--653.Google ScholarGoogle ScholarCross RefCross Ref
  4. Robert Chang. 2015. Detecting and avoiding bucket imbalance in A/B tests. (Dec.. 2015). Retrieved February 16, 2017 from https://blog.twitter.com/2015/detecting-and-avoiding-bucket-imbalance-in-ab-testsGoogle ScholarGoogle Scholar
  5. Thomas Crook, Brian Frasca, Ron Kohavi, and Roger Longbotham. 2009. Seven pitfalls to avoid when running controlled experiments on the web Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1105--1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alex Deng, Tianxi Li, and Yu Guo. 2014. Statistical inference in two-stage online controlled experiments with treatment selection and validation. In Proceedings of the 23rd international conference on World wide web. ACM, 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bradley Efron. 2011. Tweedie's formula and selection bias. J. Amer. Statist. Assoc. Vol. 106, 496 (Dec.. 2011), 1602--1614.Google ScholarGoogle ScholarCross RefCross Ref
  8. Bradley Efron. 2012. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Vol. Vol. 1. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  9. Bradley Efron. 2014. Estimation and accuracy after model selection. J. Amer. Statist. Assoc. Vol. 109, 507 (July. 2014), 991--1007.Google ScholarGoogle ScholarCross RefCross Ref
  10. Bradley Efron and Robert J Tibshirani. 1993. An introduction to the bootstrap. Chapman and Hall, London.Google ScholarGoogle Scholar
  11. William Fithian, Dennis Sun, and Jonathan Taylor. 2014. Optimal inference after model selection. arXiv preprint arXiv:1410.2597 (Oct.. 2014).Google ScholarGoogle Scholar
  12. Chad Garner. 2007. Upward bias in odds ratio estimates from genome-wide association studies. Genetic epidemiology, Vol. 31, 4 (May. 2007), 288--295.Google ScholarGoogle Scholar
  13. Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1168--1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ron Kohavi, Randal M Henne, and Dan Sommerfield. 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 959--967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne. 2009. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery Vol. 18, 1 (Feb. 2009), 140--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jason D Lee, Dennis L Sun, Yuekai Sun, Jonathan E Taylor, and others. 2016. Exact post-selection inference, with application to the lasso. The Annals of Statistics Vol. 44, 3 (April. 2016), 907--927.Google ScholarGoogle ScholarCross RefCross Ref
  17. Will Moss. 2014. Experiment reporting framework. (May. 2014). Retrieved February 16, 2017 from http://nerds.airbnb.com/experiment-reporting-frameworkGoogle ScholarGoogle Scholar
  18. Jan Overgoor. 2014. Experiments at Airbnb. (May. 2014). Retrieved February 16, 2017 from http://nerds.airbnb.com/experiments-at-airbnbGoogle ScholarGoogle Scholar
  19. Lei Sun and Shelley B Bull. 2005. Reduction of selection bias in genomewide studies by resampling. Genetic epidemiology, Vol. 28, 4 (May. 2005), 352--367.Google ScholarGoogle Scholar
  20. Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Richard H Thaler. 1988. Anomalies: The winner's curse. The Journal of Economic Perspectives Vol. 2, 1 (Jan.. 1988), 191--202.Google ScholarGoogle Scholar
  22. Rui Xiao and Michael Boehnke. 2009. Quantifying and correcting for the winner's curse in genetic association studies. Genetic epidemiology, Vol. 33, 5 (2009), 453--462.Google ScholarGoogle Scholar
  23. Lizhen Xu, Radu V Craiu, and Lei Sun. 2011. Bayesian methods to overcome the winner's curse in genetic studies. The Annals of Applied Statistics (2011), 201--231.Google ScholarGoogle Scholar
  24. Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2227--2236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hua Zhong and Ross L Prentice. 2008. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics, Vol. 9, 4 (Oct. 2008), 621--634.Google ScholarGoogle ScholarCross RefCross Ref
  26. Hua Zhong and Ross L Prentice. 2010. Correcting "winner's curse" in odds ratios from genomewide association findings for major complex human diseases. Genetic epidemiology, Vol. 34, 1 (Jan. 2010), 78--91.Google ScholarGoogle Scholar
  27. Sebastian Zöllner and Jonathan K Pritchard. 2007. Overcoming the winner's curse: estimating penetrance parameters from case-control data. The American Journal of Human Genetics Vol. 80, 4 (April. 2007), 605--615.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Winner's Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
                July 2018
                2925 pages
                ISBN:9781450355520
                DOI:10.1145/3219819

                Copyright © 2018 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 19 July 2018

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader