ABSTRACT
Missing values in real world datasets are a common issue. Handling missing values is one of the most key aspects in data mining, as it can seriously impact the performance of predictive models. In this paper we proposed a unified Boosting framework that consolidates model construction and missing value handling. At each Boosting iteration, weights are assigned to both the samples and features. The sample weights make difficult samples become the learning focus, while the feature weights enable critical features to be compensated by less critical features when they are unavailable. A weak classifier that abstains (i.e, produce no prediction when required feature value is missing) is learned on a data subset determined by the feature weights. Experimental results demonstrate the efficacy and robustness of the proposed method over existing Boosting algorithms.
- Kevin Bache and Moshe Lichman. 2013. UCI machine learning repository.Google Scholar
- Haixia Chen, Yuping Du, and Kai Jiang. 2012. Classification of incomplete data using classifier ensembles. In Systems and Informatics (ICSAI), 2012 International Conference on. IEEE, 2229--2232.Google ScholarCross Ref
- Yoav Freund and Robert E Schapire. 1996. Experiments with a new boosting algorithm. In ICML, Vol. 96. 148--156. Google ScholarDigital Library
- O. L. Mangasarian and W. H. Wolberg. 1990. Cancer diagnosis via linear programming. SIAM News , Vol. 23, 5 (1990), 1 & 18.Google Scholar
- Joseph O'Sullivan, John Langford, Rich Caruana, and Avrim Blum. 2000. FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness.. In ICML . 703--710. Google ScholarDigital Library
- Robert E Schapire and Yoram Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine learning , Vol. 37, 3 (1999), 297--336. Google ScholarDigital Library
- Kristen A Severson, Brinda Monian, J Christopher Love, and Richard D Braatz. 2017. A method for learning a sparse classifier in the presence of missing data for high-dimensional biological datasets. Bioinformatics , Vol. 33, 18 (2017), 2897--2905.Google ScholarCross Ref
- Fabrizio Smeraldi, Michael Defoin-Platel, and Mansoor Saqi. 2010. Handling Missing Features with Boosting Algorithms for Protein--Protein Interaction Prediction. In International Conference on Data Integration in the Life Sciences. Springer, 132--147. Google ScholarDigital Library
- Geoffrey I Webb. 1998. The problem of missing values in decision tree grafting. In Australian Joint Conference on Artificial Intelligence. Springer, 273--283. Google ScholarDigital Library
- Sabit Anwar Zahin, Chowdhury Farhan Ahmed, and Tahira Alam. 2018. An effective method for classification with missing values. Applied Intelligence (2018), 1--22. Google ScholarDigital Library
Index Terms
- DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain
Recommendations
Logistic Model Trees with AUC split criterion for the KDD cup 2009 small challenge
KDD-CUP'09: Proceedings of the 2009 International Conference on KDD-Cup 2009 - Volume 7In this work, we describe our approach to the "Small Challenge" of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number ...
Impact of imputation of missing values on classification error for discrete data
Numerous industrial and research databases include missing values. It is not uncommon to encounter databases that have up to a half of the entries missing, making it very difficult to mine them using data analysis methods that can work only with ...
Missing values: how many can they be to preserve classification reliability?
Using five medical datasets we detected the influence of missing values on true positive rates and classification accuracy. We randomly marked more and more values as missing and tested their effects on classification accuracy. The classifications were ...
Comments