skip to main content
10.1145/2783258.2788602acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

Published:10 August 2015Publication History

ABSTRACT

A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used among online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. At LinkedIn, we have seen tremendous growth of controlled experiments over time, with now over 400 concurrent experiments running per day. General A/B testing frameworks and methodologies, including challenges and pitfalls, have been discussed extensively in several previous KDD work [7, 8, 9, 10]. In this paper, we describe in depth the experimentation platform we have built at LinkedIn and the challenges that arise particularly when running A/B tests at large scale in a social network setting. We start with an introduction of the experimentation platform and how it is built to handle each step of the A/B testing process at LinkedIn, from designing and deploying experiments to analyzing them. It is then followed by discussions on several more sophisticated A/B testing scenarios, such as running offline experiments and addressing the network effect, where one user's action can influence that of another. Lastly, we talk about features and processes that are crucial for building a strong experimentation culture.

Skip Supplemental Material Section

Supplemental Material

p2227.mp4

mp4

200.7 MB

References

  1. Rubin, Donald B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974.Google ScholarGoogle Scholar
  2. Ugander, Johan, Karrer, Brian, Backstrom, Lars and Kleinberg, Jon. Graph cluster randomization: network exposure to multiple universes. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 329--337. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Katzir, Liran, Liberty, Edo and Somekh Oren. Framework and algorithms for network bucket testing. Proceedings of the 21st international conference on World Wide Web, pages 1029--1036. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Toulis, Panos and Kao, Edward. Estimation of causal peer influence effects. Proceedings of The 30th International Conference on Machine Learning, pages 1489--1497, 2013.Google ScholarGoogle Scholar
  5. Eckles, Dean, Karrer, Brian and Ugander, Johan. Design and analysis of experiments in networks: Reducing bias from interference. arXiv preprint arXiv:1404.7530, 2014.Google ScholarGoogle Scholar
  6. Aronow, Peter M, and Samii, Cyrus. Estimating average causal effects under general interference. arXiv preprint arXiv:1305.6156, 2013Google ScholarGoogle Scholar
  7. Kohavi, Ron, et al. Trustworthy online controlled experiments: Five puzzling outcomes explained. Proceedings of the 18th Conference on Knowledge Discovery and Data Mining. 2012, www.exp-platform.com/Pages/PuzzingOutcomesExplained.aspx. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th Conference on Knowledge Discovery and Data Mining. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kohavi, Ron, et al. Online Controlled Experiments at Large Scale. KDD 2013: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. http://bit.ly/ExPScale. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kohavi, Ron, et al. Seven Rules of Thumb for Web Site Experimenters. KDD 2014: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yates, Frank, Sir Ronald Fisher and the Design of Experiments. Biometrics, 20(2):307--321, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  12. Bakshy, Eytan, Echles, Dean and Bernstein, Michael S. Designing and Deploying Online Field Experiments. Proceedings of the 23rd international conference on World Wide Web, pages 283--292, ACM, 2014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kohavi, Ron. et al. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery. February 2009, Vol. 18, 1, pp. 140--181. http://www.exp-platform.com/Pages/hippo_long.aspx. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Crook, Thomas, et al. Seven Pitfalls to Avoid when Running Controlled Experiments on the Web. {ed.} Peter Flach and Mohammed Zaki. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. 2009, pp. 1105--1114. http://www.exp-platform.com/Pages/ExPpitfalls.aspx. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ioannidis, John PA. "Why most published research findings are false." PLoS medicine 2.8 (2005): e124.Google ScholarGoogle Scholar
  16. Wacholder, Sholom, et al. "Assessing the probability that a positive report is false: an approach for molecular epidemiology studies." Journal of the National Cancer Institute 96.6 (2004): 434--442.Google ScholarGoogle ScholarCross RefCross Ref
  17. Benjamini, Yoav, and Yosef Hochberg. "Controlling the false discovery rate: a practical and powerful approach to multiple testing." Journal of the Royal Statistical Society. Series B (Methodological) (1995): 289--300.Google ScholarGoogle ScholarCross RefCross Ref
  18. Saaty, Thomas L. "How to make a decision: the analytic hierarchy process."European journal of operational research 48.1 (1990): 9--26.Google ScholarGoogle Scholar
  19. Gui, Huan, Xu, Ya, Bhasin, Anmol, Han Jiawei. Network A/B Testing: From Sampling to Estimation. Proceedings of the 24rd international conference on World Wide Web, ACM, 2015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Box, George EP, J. Stuart Hunter, and William G. Hunter. "Statistics for experimenters: design, innovation, and discovery." AMC 10 (2005): 12.Google ScholarGoogle Scholar
  21. Gerber, A. S., and Green, D. P. Field Experiments: Design, Analysis, and Interpretation. WW Norton, 2012Google ScholarGoogle Scholar
  22. Sumbaly, Roshan, et al. "Serving large-scale batch computed data with project voldemort." Proceedings of the 10th USENIX conference on File and Storage Technologies. USENIX Association, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tate, Ryan. The Software Revolution Behind LinkedIn's Gushing Profits. {Online} http://www.wired.com/2013/04/linkedin-software-revolutionGoogle ScholarGoogle Scholar
  24. Auradkar, Aditya, et al. "Data infrastructure at LinkedIn." Data Engineering (ICDE), 2012 IEEE 28th International Conference on. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log processing." Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), Athens, Greece. 2011.Google ScholarGoogle Scholar
  26. Naga, Praveen Neppalli, Real-time Analytics at Massive Scale with Pinot. {Online} September 29, 2014 http://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinotGoogle ScholarGoogle Scholar
  27. Fisher, Ronald A. Presidential Address. Sankhya: The Indian Journal of Statistics. 1938, Vol. 4, 1. http://www.jstor.org/stable/40383882.Google ScholarGoogle Scholar
  28. Montgomery, Douglas C. Design and analysis of experiments. John Wiley & Sons, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Betz, Joe, Tagle, Moira. Rest.li:RESTful Service Architecture at Scale. {Online} February, 19, 2013 https://engineering.linkedin.com/architecture/restli-restful-service-architecture-scaleGoogle ScholarGoogle Scholar
  30. Romano, Joseph P. Azeem M. Shaikh and Michael Wolf. 2010b Multiple Testing. New Palgrave Dictionary of Economics. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.418.4975&rep=rep1&type=pdfGoogle ScholarGoogle Scholar
  31. Wikipedia. Simpson's Paradox. {Online} http://en.wikipedia.org/wiki/Simpson%27s_paradoxGoogle ScholarGoogle Scholar
  32. McFarland, Colin. Experiment!: Website conversion rate optimization with A/B and multivariate testing. s.1. : New Riders, 2012.978-0321834607Google ScholarGoogle Scholar
  33. Eisenberg, Bryan. How to Improve A/B Testing. ClickZ Network. {Online} April 29, 2005. www.clickz.com/clickz/column/1717234/how-improvem-a-b-testing.Google ScholarGoogle Scholar
  34. Vemuri, Srinivas, Varshney, Maneesh, Puttaswamy, Krishna and Liu, Rui. Execution Primitives for Scalable Joins and Aggregations in Map Reduce. Proceedings of the VLDB Endowment, Vol. 7, No. 13 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Varshney, Maneesh, Vemuri, Srinivas. Open Sourcing Cubert: A High Performance Computation Engine for Complex Big Data Analytics {Online} November 11, 2014 https://engineering.linkedin.com/big-data/open-sourcing-cubert-high-performance-computation-engine-complex-big-data-analyticsGoogle ScholarGoogle Scholar

Index Terms

  1. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader