skip to main content
10.1145/2487575.2488200acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open Access

Ad click prediction: a view from the trenches

Published:11 August 2013Publication History

ABSTRACT

Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates.

We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for providing confidence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be beneficial for us, despite promising results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.

References

  1. D. Agarwal, B.-C. Chen, and P. Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th international conference on World wide web, pages 21--30. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault-tolerant and scalable joining of continuous data streams. In SIGMOD Conference, 2013. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and distributed approaches. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Blum, A. Kalai, and J. Langford. Beating the hold-out: Bounds for k-fold and progressive cross-validation. In COLT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Chapelle. Click modeling for display advertising. In AdML: 2012 ICML Workshop on Online Advertising, 2012.Google ScholarGoogle Scholar
  7. C. Cortes, M. Mohri, M. Riley, and A. Rostamizadeh. Sample selection bias correction theory. In ALT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning, 40(2):139--157, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In COLT, 2010.Google ScholarGoogle Scholar
  11. J. Duchi and Y. Singer. Efficient learning using forward-backward splitting. In Advances in Neural Information Processing Systems 22, pages 495--503. 2009.Google ScholarGoogle Scholar
  12. L. Fan, P. Cao, J. Almeida, and A. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3), jun 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861--874, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Golovin, D. Sculley, H. B. McMahan, and M. Young. Large-scale learning with a small-scale footprint. In ICML, 2013. To appear.Google ScholarGoogle Scholar
  15. T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale Bayesian click-through rate prediction for sponsored search advertising in microsofts bing search engine. In Proc. 27th Internat. Conf. on Machine Learning, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. Improving ad relevance in sponsored search. In Proceedings of the third ACM international conference on Web search and data mining, WSDM '10, pages 361--370, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.Google ScholarGoogle Scholar
  18. D. W. Hosmer and S. Lemeshow. Applied logistic regression. Wiley-Interscience Publication, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  19. H. A. Koepke and M. Bilenko. Fast prediction of new feature utility. In ICML, 2012.Google ScholarGoogle Scholar
  20. J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. JMLR, 10, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S.-M. Li, M. Mahdian, and R. P. McAfee. Value of learning in sponsored search auctions. In WINE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Li, X. Wang, R. Zhang, Y. Cui, J. Mao, and R. Jin. Exploitation and exploration in a performance based contextual advertising system. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Luss, S. Rosset, and M. Shahar. Efficient regularized isotonic regression with application to gene--gene interaction search. Ann. Appl. Stat., 6(1), 2012.Google ScholarGoogle Scholar
  24. H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.Google ScholarGoogle Scholar
  25. H. B. McMahan and O. Muralidharan. On calibrated predictions for auction selection mechanisms. CoRR, abs/1211.3955, 2012.Google ScholarGoogle Scholar
  26. H. B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization. In COLT, 2010.Google ScholarGoogle Scholar
  27. A. Niculescu-Mizil and R. Caruana. Predicting good probabilities with supervised learning. In ICML, ICML '05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521--530. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. J. Streeter and H. B. McMahan. Less regret via online conditioning. CoRR, abs/1002.4862, 2010.Google ScholarGoogle Scholar
  30. D. Tang, A. Agarwal, D. O'Brien, and M. Meyer. Overlapping experiment infrastructure: more, better, faster experimentation. In KDD, pages 17--26, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML, pages 1113--1120. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Xiao. Dual averaging method for regularized stochastic learning and online optimization. In NIPS, 2009.Google ScholarGoogle Scholar
  33. Z. A. Zhu, W. Chen, T. Minka, C. Zhu, and Z. Chen. A novel click model and its applications to online advertising. In Proceedings of the third ACM international conference on Web search and data mining, pages 321--330. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ad click prediction: a view from the trenches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2013
      1534 pages
      ISBN:9781450321747
      DOI:10.1145/2487575

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 August 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader