ABSTRACT
Every day more technologies and services are backed by complex machine-learned models, consuming large amounts of data to provide a myriad of useful services. While users are willing to provide personal data to enable these services, their trust in and engagement with the systems could be improved by providing insight into how the machine learned decisions were made. Complex ML systems are highly effective but many of them are black boxes and give no insight into how they make the choices they make. Moreover, those that do often do so at the model-level rather than the instance-level. In this work we present a method for deriving explanations for instance-level decisions in tree ensembles. As this family of models accounts for a large portion of industrial machine learning, this work opens up the possibility for transparent models at scale.
- Leo Breiman . 2001. Random Forests. Machine Learning, Vol. 45, 1 (2001), 5--32. Google ScholarDigital Library
- Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad . 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-Day Readmission Proceedings of the ACM Conference on Knowledge Discovery and Data Mining. Sydney, Australia, 1721--1730. Google ScholarDigital Library
- George Forman . 2003. An extensive empirical study of feature selection metrics for text classification. Journal of machine learning research Vol. 3, Mar (2003), 1289--1305. Google ScholarDigital Library
- Satoshi Hara and Kohei Hayashi . 2016. Making tree ensembles interpretable. WHI 2016. arXiv preprint arXiv:1606.05390 (2016).Google Scholar
- M. Lichman . 2013. UCI Machine Learning Repository. (2013). http://archive.ics.uci.edu/mlGoogle Scholar
- Yin Lou, Rich Caruana, and Johannes Gehrke . 2012. Intelligible Models for Classification and Regression Proceedings of the ACM Conference on Knowledge Discovery and Data Mining. Beijing, China. Google ScholarDigital Library
- Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker . 2013. Accurate Intelligible Models with Pairwise Interactions Proceedings of the ACM Conference on Knowledge Discovery and Data Mining. Chicago, Illinois. Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin . 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144. Google ScholarDigital Library
Index Terms
- Transparent Tree Ensembles
Recommendations
Software defect prediction using tree-based ensembles
PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software EngineeringSoftware defect prediction is an active research area in software engineering. Accurate prediction of software defects assists software engineers in guiding software quality assurance activities. In machine learning, ensemble learning has been proven to ...
Deep learning and Boosted trees for injuries prediction in power infrastructure projects
AbstractElectrical injury impacts are substantial and massive. Investments in electricity will continue to increase, leading to construction project complexities, which undoubtedly contribute to injuries and associated effects. Machine learning (ML) ...
Highlights- Presented deep learning and boosted tree approaches for safety management.
- Benchmark deep learning models with other machine learning techniques.
- Deep neural networks yield better prediction ability.
- Interpretable models for ...
Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningDecision tree ensembles are widely used and competitive learning models. Despite their success, popular toolkits for learning tree ensembles have limited modeling capabilities. For instance, these toolkits support a limited number of loss functions and ...
Comments