skip to main content
10.1145/3269206.3269319acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain

Published:17 October 2018Publication History

ABSTRACT

Missing values in real world datasets are a common issue. Handling missing values is one of the most key aspects in data mining, as it can seriously impact the performance of predictive models. In this paper we proposed a unified Boosting framework that consolidates model construction and missing value handling. At each Boosting iteration, weights are assigned to both the samples and features. The sample weights make difficult samples become the learning focus, while the feature weights enable critical features to be compensated by less critical features when they are unavailable. A weak classifier that abstains (i.e, produce no prediction when required feature value is missing) is learned on a data subset determined by the feature weights. Experimental results demonstrate the efficacy and robustness of the proposed method over existing Boosting algorithms.

References

  1. Kevin Bache and Moshe Lichman. 2013. UCI machine learning repository.Google ScholarGoogle Scholar
  2. Haixia Chen, Yuping Du, and Kai Jiang. 2012. Classification of incomplete data using classifier ensembles. In Systems and Informatics (ICSAI), 2012 International Conference on. IEEE, 2229--2232.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yoav Freund and Robert E Schapire. 1996. Experiments with a new boosting algorithm. In ICML, Vol. 96. 148--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. L. Mangasarian and W. H. Wolberg. 1990. Cancer diagnosis via linear programming. SIAM News , Vol. 23, 5 (1990), 1 & 18.Google ScholarGoogle Scholar
  5. Joseph O'Sullivan, John Langford, Rich Caruana, and Avrim Blum. 2000. FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness.. In ICML . 703--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Robert E Schapire and Yoram Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine learning , Vol. 37, 3 (1999), 297--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kristen A Severson, Brinda Monian, J Christopher Love, and Richard D Braatz. 2017. A method for learning a sparse classifier in the presence of missing data for high-dimensional biological datasets. Bioinformatics , Vol. 33, 18 (2017), 2897--2905.Google ScholarGoogle ScholarCross RefCross Ref
  8. Fabrizio Smeraldi, Michael Defoin-Platel, and Mansoor Saqi. 2010. Handling Missing Features with Boosting Algorithms for Protein--Protein Interaction Prediction. In International Conference on Data Integration in the Life Sciences. Springer, 132--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Geoffrey I Webb. 1998. The problem of missing values in decision tree grafting. In Australian Joint Conference on Artificial Intelligence. Springer, 273--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sabit Anwar Zahin, Chowdhury Farhan Ahmed, and Tahira Alam. 2018. An effective method for classification with missing values. Applied Intelligence (2018), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
            October 2018
            2362 pages
            ISBN:9781450360142
            DOI:10.1145/3269206

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 October 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper

            Acceptance Rates

            CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          • Article Metrics

            • Downloads (Last 12 months)3
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader