research-article

Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

Authors:
Amr Ahmed

Research at Google, Mountain View, CA, USA

Research at Google, Mountain View, CA, USA
View Profile

,
Abhimanyu Das

Microsoft Reserach, Mountain View, CA, USA

Microsoft Reserach, Mountain View, CA, USA
View Profile

,
Alexander J. Smola

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014Pages 153–162https://doi.org/10.1145/2556195.2556264

Published:24 February 2014Publication History

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 153–162

ABSTRACT

Many estimation tasks come in groups and hierarchies of related problems. In this paper we propose a hierarchical model and a scalable algorithm to perform inference for multitask learning. It infers task correlation and subtask structure in a joint sparse setting. Implementation is achieved by a distributed subgradient oracle and the successive application of prox-operators pertaining to groups and subgroups of variables. We apply this algorithm to conversion optimization in display advertising. Experimental results on over 1TB data for up to 1 billion observations and 1 million attributes show that the algorithm provides significantly better prediction accuracy while simultaneously beingefficiently scalable by distributed parameter synchronization.

References

A. Ahmed, M. Aly, A. Das, A. Smola, and T. Anastasakos. Web-scale multi-task feature selection for behavioral targeting. In CIKM, 2012. Google ScholarDigital Library
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. Smola. Scalable inference in latent variable models. In Web Science and Data Mining (WSDM), 2012. Google ScholarDigital Library
M. Aly, A. Hatch, V. Josifovski, and V. K. Narayanan. Web-scale user modeling for targeting. In WWW, 2012. Google ScholarDigital Library
A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008. Google ScholarDigital Library
F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1--106, 2012. Google ScholarDigital Library
A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183--202, 2009. Google ScholarDigital Library
R. Caruana. Multitask learning. Machine Learning, 28:41--75, 1997. Google ScholarDigital Library
N. R. Draper and H. Smith. Applied Regression Analysis. John Wiley and Sons, New York, NY, 1981.Google Scholar
J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432--441, 2008.Google ScholarCross Ref
R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12:2297--2334, 2011. Google ScholarDigital Library
D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Symposium on the Theory of Computing STOC, pages 654--663, New York, May 1997. Association for Computing Machinery. Google ScholarDigital Library
T. R. Shultz and F. Rivest. Using knowledge to speed learning: A comparison knowledge-based cascade-correlation and multi-task learning. In Proc. Intl. Conf. Machine Learning, pages 871--878. Morgan Kaufmann, San Francisco, CA, 2000. Google ScholarDigital Library
A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Very Large Databases (VLDB), 2010. Google ScholarDigital Library
S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531--1565, 2006. Google ScholarDigital Library
W. H. Southwell. Fitting data to nonlinear functions with uncertainties in all measurement variables. Comput. J., 19(1):69--73, 1976.Google ScholarCross Ref
N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In P. Auer and R. Meir, editors, Proc. Annual Conf. Computational Learning Theory, number 3559 in Lecture Notes in Artificial Intelligence, pages 545--560. Springer-Verlag, June 2005. Google ScholarDigital Library
C. Teo, Q. Le, A. J. Smola, and S. V. N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD). ACM, 2007. Google ScholarDigital Library
S. Thrun and J. O'Sullivan. Discovering structure in multiple learning tasks: the TC algorithm. In Proc. Intl. Conf. Machine Learning, pages 489--497. Morgan Kaufmann, 1996.Google Scholar
M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In A. P. Danyluk, L. Bottou, and M. L. Littman, editors, ICML, volume 382 of ACM International Conference Proceeding Series, page 134. ACM, 2009. Google ScholarDigital Library
K. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature hashing for large scale multitask learning. In L. Bottou and M. Littman, editors, International Conference on Machine Learning, 2009. Google ScholarDigital Library
J. Ye, J. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In CIKM. ACM, 2009. Google ScholarDigital Library
K. Yu, V. Tresp, and A. Schwaighofer. Learning gaussian processes from multiple tasks. In Proceedings of the 22nd International Conference on Machine Learning, volume 119, pages 1012--1019. ACM, 2005. Google ScholarDigital Library
Y. Zhang and D.-Y. Yeung. A convex formulation for learning task relationships in multi-task learning. In Uncertainty in Artificial Intelligence, 2010.Google Scholar

Index Terms

Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

Recommendations

Simple and Scalable Response Prediction for Display Advertising
Special Sections on Diversity and Discovery in Recommender Systems, Online Advertising and Regular Papers

Clickthrough and conversation rates estimation are two core predictions tasks in display advertising. We present in this article a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display ...
Read More
Multitask Learning
Special issue on inductive transfer

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared ...
Read More
An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction
RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions. For example, predicting if a user will click on an advertisement and if they will then purchase the advertised ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
General Chairs:
Ben Carterette
University of Delaware, USA
,
Fernando Diaz
Microsoft Research, USA
,
Program Chairs:
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Donald Metzler
Google, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
display advertising
distributed optimization
large-scale machine learning
multi-task learning
sparsity
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 579
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Simple and Scalable Response Prediction for Display Advertising

Multitask Learning

An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction