research-article

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

Authors:
Ya Xu

LinkedIn Corp, Mountain View, CA, USA

LinkedIn Corp, Mountain View, CA, USA
View Profile

,
Nanyu Chen

LinkedIn Corp, Mountain View, CA, USA

LinkedIn Corp, Mountain View, CA, USA
View Profile

,
Addrian Fernandez

LinkedIn Corp, Mountain View, CA, USA

LinkedIn Corp, Mountain View, CA, USA
View Profile

,
Omar Sinno

LinkedIn Corp, Mountain View, CA, USA

LinkedIn Corp, Mountain View, CA, USA
View Profile

,
Anmol Bhasin

LinkedIn Corp, Mountain View, CA, USA

LinkedIn Corp, Mountain View, CA, USA
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 2227–2236https://doi.org/10.1145/2783258.2788602

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 2227–2236

ABSTRACT

A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used among online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. At LinkedIn, we have seen tremendous growth of controlled experiments over time, with now over 400 concurrent experiments running per day. General A/B testing frameworks and methodologies, including challenges and pitfalls, have been discussed extensively in several previous KDD work [7, 8, 9, 10]. In this paper, we describe in depth the experimentation platform we have built at LinkedIn and the challenges that arise particularly when running A/B tests at large scale in a social network setting. We start with an introduction of the experimentation platform and how it is built to handle each step of the A/B testing process at LinkedIn, from designing and deploying experiments to analyzing them. It is then followed by discussions on several more sophisticated A/B testing scenarios, such as running offline experiments and addressing the network effect, where one user's action can influence that of another. Lastly, we talk about features and processes that are crucial for building a strong experimentation culture.

Supplemental Material

p2227.mp4

mp4

200.7 MB

Download

References

Rubin, Donald B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974.Google Scholar
Ugander, Johan, Karrer, Brian, Backstrom, Lars and Kleinberg, Jon. Graph cluster randomization: network exposure to multiple universes. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 329--337. ACM, 2013. Google ScholarDigital Library
Katzir, Liran, Liberty, Edo and Somekh Oren. Framework and algorithms for network bucket testing. Proceedings of the 21st international conference on World Wide Web, pages 1029--1036. ACM, 2012. Google ScholarDigital Library
Toulis, Panos and Kao, Edward. Estimation of causal peer influence effects. Proceedings of The 30th International Conference on Machine Learning, pages 1489--1497, 2013.Google Scholar
Eckles, Dean, Karrer, Brian and Ugander, Johan. Design and analysis of experiments in networks: Reducing bias from interference. arXiv preprint arXiv:1404.7530, 2014.Google Scholar
Aronow, Peter M, and Samii, Cyrus. Estimating average causal effects under general interference. arXiv preprint arXiv:1305.6156, 2013Google Scholar
Kohavi, Ron, et al. Trustworthy online controlled experiments: Five puzzling outcomes explained. Proceedings of the 18th Conference on Knowledge Discovery and Data Mining. 2012, www.exp-platform.com/Pages/PuzzingOutcomesExplained.aspx. Google ScholarDigital Library
Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th Conference on Knowledge Discovery and Data Mining. 2010. Google ScholarDigital Library
Kohavi, Ron, et al. Online Controlled Experiments at Large Scale. KDD 2013: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. http://bit.ly/ExPScale. Google ScholarDigital Library
Kohavi, Ron, et al. Seven Rules of Thumb for Web Site Experimenters. KDD 2014: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. Google ScholarDigital Library
Yates, Frank, Sir Ronald Fisher and the Design of Experiments. Biometrics, 20(2):307--321, 1964.Google ScholarCross Ref
Bakshy, Eytan, Echles, Dean and Bernstein, Michael S. Designing and Deploying Online Field Experiments. Proceedings of the 23rd international conference on World Wide Web, pages 283--292, ACM, 2014 Google ScholarDigital Library
Kohavi, Ron. et al. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery. February 2009, Vol. 18, 1, pp. 140--181. http://www.exp-platform.com/Pages/hippo_long.aspx. Google ScholarDigital Library
Crook, Thomas, et al. Seven Pitfalls to Avoid when Running Controlled Experiments on the Web. {ed.} Peter Flach and Mohammed Zaki. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. 2009, pp. 1105--1114. http://www.exp-platform.com/Pages/ExPpitfalls.aspx. Google ScholarDigital Library
Ioannidis, John PA. "Why most published research findings are false." PLoS medicine 2.8 (2005): e124.Google Scholar
Wacholder, Sholom, et al. "Assessing the probability that a positive report is false: an approach for molecular epidemiology studies." Journal of the National Cancer Institute 96.6 (2004): 434--442.Google ScholarCross Ref
Benjamini, Yoav, and Yosef Hochberg. "Controlling the false discovery rate: a practical and powerful approach to multiple testing." Journal of the Royal Statistical Society. Series B (Methodological) (1995): 289--300.Google ScholarCross Ref
Saaty, Thomas L. "How to make a decision: the analytic hierarchy process."European journal of operational research 48.1 (1990): 9--26.Google Scholar
Gui, Huan, Xu, Ya, Bhasin, Anmol, Han Jiawei. Network A/B Testing: From Sampling to Estimation. Proceedings of the 24rd international conference on World Wide Web, ACM, 2015 Google ScholarDigital Library
Box, George EP, J. Stuart Hunter, and William G. Hunter. "Statistics for experimenters: design, innovation, and discovery." AMC 10 (2005): 12.Google Scholar
Gerber, A. S., and Green, D. P. Field Experiments: Design, Analysis, and Interpretation. WW Norton, 2012Google Scholar
Sumbaly, Roshan, et al. "Serving large-scale batch computed data with project voldemort." Proceedings of the 10th USENIX conference on File and Storage Technologies. USENIX Association, 2012. Google ScholarDigital Library
Tate, Ryan. The Software Revolution Behind LinkedIn's Gushing Profits. {Online} http://www.wired.com/2013/04/linkedin-software-revolutionGoogle Scholar
Auradkar, Aditya, et al. "Data infrastructure at LinkedIn." Data Engineering (ICDE), 2012 IEEE 28th International Conference on. IEEE, 2012. Google ScholarDigital Library
Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log processing." Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), Athens, Greece. 2011.Google Scholar
Naga, Praveen Neppalli, Real-time Analytics at Massive Scale with Pinot. {Online} September 29, 2014 http://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinotGoogle Scholar
Fisher, Ronald A. Presidential Address. Sankhya: The Indian Journal of Statistics. 1938, Vol. 4, 1. http://www.jstor.org/stable/40383882.Google Scholar
Montgomery, Douglas C. Design and analysis of experiments. John Wiley & Sons, 2008. Google ScholarDigital Library
Betz, Joe, Tagle, Moira. Rest.li:RESTful Service Architecture at Scale. {Online} February, 19, 2013 https://engineering.linkedin.com/architecture/restli-restful-service-architecture-scaleGoogle Scholar
Romano, Joseph P. Azeem M. Shaikh and Michael Wolf. 2010b Multiple Testing. New Palgrave Dictionary of Economics. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.418.4975&rep=rep1&type=pdfGoogle Scholar
Wikipedia. Simpson's Paradox. {Online} http://en.wikipedia.org/wiki/Simpson%27s_paradoxGoogle Scholar
McFarland, Colin. Experiment!: Website conversion rate optimization with A/B and multivariate testing. s.1. : New Riders, 2012.978-0321834607Google Scholar
Eisenberg, Bryan. How to Improve A/B Testing. ClickZ Network. {Online} April 29, 2005. www.clickz.com/clickz/column/1717234/how-improvem-a-b-testing.Google Scholar
Vemuri, Srinivas, Varshney, Maneesh, Puttaswamy, Krishna and Liu, Rui. Execution Primitives for Scalable Joins and Aggregations in Map Reduce. Proceedings of the VLDB Endowment, Vol. 7, No. 13 Google ScholarDigital Library
Varshney, Maneesh, Vemuri, Srinivas. Open Sourcing Cubert: A High Performance Computation Engine for Complex Big Data Analytics {Online} November 11, 2014 https://engineering.linkedin.com/big-data/open-sourcing-cubert-high-performance-computation-engine-complex-big-data-analyticsGoogle Scholar

Index Terms

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks
1. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Exploratory data analysis

Recommendations

Online controlled experiments at large scale
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At ...
Read More
Peeking at A/B Tests: Why it matters, and what to do about it
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This paper reports on novel statistical methodology, which has been deployed by the commercial A/B testing platform Optimizely to communicate experimental results to their customers. Our methodology addresses the issue that traditional p-values and ...
Read More
Top Challenges from the first Practical Online Controlled Experiments Summit

Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
a/b testing
controlled experiments
measurement
network a/b testing
online experiments
social network
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 125
  Total Citations
  View Citations
- 2,666
  Total Downloads
- Downloads (Last 12 months)283
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Online controlled experiments at large scale

Peeking at A/B Tests: Why it matters, and what to do about it

Top Challenges from the first Practical Online Controlled Experiments Summit