ABSTRACT
In this paper we present gradient boosted decision tables (BDTs). A d-dimensional decision table is essentially a mapping from a sequence of d boolean tests to a real value in {R}. We propose novel algorithms to fit decision tables. Our thorough empirical study suggests that decision tables are better weak learners in the gradient boosting framework and can improve the accuracy of the boosted ensemble. In addition, we develop an efficient data structure to represent decision tables and propose a novel fast algorithm to improve the scoring efficiency for boosted ensemble of decision tables. Experiments on public classification and regression datasets demonstrate that our method is able to achieve 1.5x to 6x speedups over the boosted regression trees baseline. We complement our experimental evaluation with a bias-variance analysis that explains how different weak models influence the predictive power of the boosted ensemble. Our experiments suggest gradient boosting with randomly backfitted decision tables distinguishes itself as the most accurate method on a number of classification and regression problems. We have deployed a BDT model to LinkedIn news feed system and achieved significant lift on key metrics.
- E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1):105--139, 1999. Google ScholarDigital Library
- L. Breiman. Bagging predictors. Machine learning, 24(2):123--140, 1996. Google ScholarDigital Library
- L. Breiman, J. Friedman, C. Stone, and R. Olshen. Classification and regression trees. CRC press, 1984.Google Scholar
- C. Burges. From ranknet to lambdarank to lambdamart: An overview. Learning, 11:23--581, 2010.Google Scholar
- G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management, 52(6):1161--1177, 2016. Google ScholarDigital Library
- R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In ICML, 2006. Google ScholarDigital Library
- J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29:1189--1232, 2001. Google ScholarCross Ref
- J. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38:367--378, 2002. Google ScholarDigital Library
- J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Annals of Statistics, 28:337--407, 2000. Google ScholarCross Ref
- Y. Ganjisaffar, R. Caruana, and C. V. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. In SIGIR, 2011. Google ScholarDigital Library
- S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural computation, 4(1):1--58, 1992. Google ScholarDigital Library
- R. Kelley Pace and R. Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291--297, 1997. Google Scholar
- P. Li. Robust logitboost and adaptive base class (abc) logitboost. In UAI, 2010.Google Scholar
- P. Li, C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS, 2007.Google ScholarDigital Library
- Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate intelligible models with pairwise interactions. In KDD, 2013. Google ScholarDigital Library
- C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In SIGIR, 2015. Google ScholarDigital Library
- D. Y. Pavlov, A. Gorodilov, and C. A. Brunk. Bagboo: a scalable hybrid bagging-the-boosting model. In CIKM, 2010. Google ScholarDigital Library
- S. Tyree, K. Q. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In WWW, 2011. Google ScholarDigital Library
- S. M. Weiss and N. Indurkhya. Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3:383--403, 1995.yGoogle ScholarDigital Library
Index Terms
- BDT: Gradient Boosted Decision Tables for High Accuracy and Scoring Efficiency
Recommendations
Development of predictive model of diabetic using supervised machine learning classification algorithm of ensemble voting
Predicting the health status of patients suffering from diabetic is an important task in the health sector because the medical history of diabetic evidenced that it is a slow killer. If data collection is enough, suitable, and noise-free, such ...
Improved dominance rough set-based classification system
Feature selection and classification is widely used in many areas of science and engineering, as large datasets become increasingly common. In particular, bioscience and medical datasets routinely contain several thousands of features. For effective ...
Convex Hull Ensemble Machine for Regression and Classification
We propose a new ensemble algorithm called Convex Hull Ensemble Machine (CHEM). CHEM in Hilbert space is first developed and modified for regression and classification problems. We prove that the ensemble model converges to the optimal model in Hilbert ...
Comments