research-article

Winner's Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments

Authors:
Minyong R. Lee

Airbnb, Inc., San Francisco, CA, USA

Airbnb, Inc., San Francisco, CA, USA
View Profile

,
Milan Shen

Airbnb, Inc., San Francisco, CA, USA

Airbnb, Inc., San Francisco, CA, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 491–499https://doi.org/10.1145/3219819.3219905

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 491–499

ABSTRACT

Online controlled experiments, or A/B testing, has been a standard framework adopted by most online product companies to measure the effect of any new change. Companies use various statistical methods including hypothesis testing and statistical inference to quantify the business impact of the changes and make product decisions. Nowadays, experimentation platforms can run as many as hundreds or even more experiments concurrently. When a group of experiments is conducted, usually the ones with significant successful results are chosen to be launched into the product. We are interested in learning the aggregated impact of the launched features. In this paper, we investigate a statistical selection bias in this process and propose a correction method of getting an unbiased estimator. Moreover, we give an implementation example at Airbnb's ERF platform (Experiment Reporting Framework) and discuss the best practices to account for this bias.

Supplemental Material

lee_curse_estimation.mp4

mp4

343.9 MB

Download

References

Theodore Alfonso Bancroft. 1944. On biases in estimation due to the use of preliminary tests of significance. The Annals of Mathematical Statistics Vol. 15, 2 (June. 1944), 190--204.Google ScholarCross Ref
Theodore Alfonso Bancroft. 1964. Analysis and inference for incompletely specified models involving the use of preliminary test (s) of significance. Biometrics, Vol. 20, 3 (Sept. 1964), 427--442.Google ScholarCross Ref
Edward C Capen, Robert V Clapp, William M Campbell, and others. 1971. Competitive bidding in high-risk situations. Journal of petroleum technology Vol. 23, 06 (June. 1971), 641--653.Google ScholarCross Ref
Robert Chang. 2015. Detecting and avoiding bucket imbalance in A/B tests. (Dec.. 2015). Retrieved February 16, 2017 from https://blog.twitter.com/2015/detecting-and-avoiding-bucket-imbalance-in-ab-testsGoogle Scholar
Thomas Crook, Brian Frasca, Ron Kohavi, and Roger Longbotham. 2009. Seven pitfalls to avoid when running controlled experiments on the web Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1105--1114. Google ScholarDigital Library
Alex Deng, Tianxi Li, and Yu Guo. 2014. Statistical inference in two-stage online controlled experiments with treatment selection and validation. In Proceedings of the 23rd international conference on World wide web. ACM, 609--618. Google ScholarDigital Library
Bradley Efron. 2011. Tweedie's formula and selection bias. J. Amer. Statist. Assoc. Vol. 106, 496 (Dec.. 2011), 1602--1614.Google ScholarCross Ref
Bradley Efron. 2012. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Vol. Vol. 1. Cambridge University Press, Cambridge.Google Scholar
Bradley Efron. 2014. Estimation and accuracy after model selection. J. Amer. Statist. Assoc. Vol. 109, 507 (July. 2014), 991--1007.Google ScholarCross Ref
Bradley Efron and Robert J Tibshirani. 1993. An introduction to the bootstrap. Chapman and Hall, London.Google Scholar
William Fithian, Dennis Sun, and Jonathan Taylor. 2014. Optimal inference after model selection. arXiv preprint arXiv:1410.2597 (Oct.. 2014).Google Scholar
Chad Garner. 2007. Upward bias in odds ratio estimates from genome-wide association studies. Genetic epidemiology, Vol. 31, 4 (May. 2007), 288--295.Google Scholar
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1168--1176. Google ScholarDigital Library
Ron Kohavi, Randal M Henne, and Dan Sommerfield. 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 959--967. Google ScholarDigital Library
Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne. 2009. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery Vol. 18, 1 (Feb. 2009), 140--181. Google ScholarDigital Library
Jason D Lee, Dennis L Sun, Yuekai Sun, Jonathan E Taylor, and others. 2016. Exact post-selection inference, with application to the lasso. The Annals of Statistics Vol. 44, 3 (April. 2016), 907--927.Google ScholarCross Ref
Will Moss. 2014. Experiment reporting framework. (May. 2014). Retrieved February 16, 2017 from http://nerds.airbnb.com/experiment-reporting-frameworkGoogle Scholar
Jan Overgoor. 2014. Experiments at Airbnb. (May. 2014). Retrieved February 16, 2017 from http://nerds.airbnb.com/experiments-at-airbnbGoogle Scholar
Lei Sun and Shelley B Bull. 2005. Reduction of selection bias in genomewide studies by resampling. Genetic epidemiology, Vol. 28, 4 (May. 2005), 352--367.Google Scholar
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 17--26. Google ScholarDigital Library
Richard H Thaler. 1988. Anomalies: The winner's curse. The Journal of Economic Perspectives Vol. 2, 1 (Jan.. 1988), 191--202.Google Scholar
Rui Xiao and Michael Boehnke. 2009. Quantifying and correcting for the winner's curse in genetic association studies. Genetic epidemiology, Vol. 33, 5 (2009), 453--462.Google Scholar
Lizhen Xu, Radu V Craiu, and Lei Sun. 2011. Bayesian methods to overcome the winner's curse in genetic studies. The Annals of Applied Statistics (2011), 201--231.Google Scholar
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2227--2236. Google ScholarDigital Library
Hua Zhong and Ross L Prentice. 2008. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics, Vol. 9, 4 (Oct. 2008), 621--634.Google ScholarCross Ref
Hua Zhong and Ross L Prentice. 2010. Correcting "winner's curse" in odds ratios from genomewide association findings for major complex human diseases. Genetic epidemiology, Vol. 34, 1 (Jan. 2010), 78--91.Google Scholar
Sebastian Zöllner and Jonathan K Pritchard. 2007. Overcoming the winner's curse: estimating penetrance parameters from case-control data. The American Journal of Human Genetics Vol. 80, 4 (April. 2007), 605--615.Google ScholarCross Ref

Index Terms

Winner's Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments

Recommendations

Online controlled experiments: introduction, learnings, and humbling statistics
RecSys '12: Proceedings of the sixth ACM conference on Recommender systems

The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend ...
Read More
Online controlled experiments: introduction, insights, scaling, and humbling statistics
UEO '13: Proceedings of the 1st workshop on User engagement optimization

The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). From front-end user-interface changes to backend algorithms, ...
Read More
Designing and deploying online field experiments
WWW '14: Proceedings of the 23rd international conference on World wide web

Online experiments are widely used to compare specific design alternatives, but they can also be used to produce generalizable knowledge and inform strategic decision making. Doing so often requires sophisticated experimental designs, iterative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
A/B testing
bias correction
multiple hypothesis testing
online experiments
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 2,814
  Total Downloads
- Downloads (Last 12 months)383
- Downloads (Last 6 weeks)47
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Winner's Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Online controlled experiments: introduction, learnings, and humbling statistics

Online controlled experiments: introduction, insights, scaling, and humbling statistics

Designing and deploying online field experiments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Winner's Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Online controlled experiments: introduction, learnings, and humbling statistics

Online controlled experiments: introduction, insights, scaling, and humbling statistics

Designing and deploying online field experiments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media