short-paper

DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain

Authors:
Weihong Wang

Data61 CSIRO, Sydney, NSW, Australia

Data61 CSIRO, Sydney, NSW, Australia
View Profile

,
Jie Xu

Data61 CSIRO, Sydney, NSW, Australia

Data61 CSIRO, Sydney, NSW, Australia
View Profile

,
Yang Wang

Data61 CSIRO, Sydney, NSW, Australia

Data61 CSIRO, Sydney, NSW, Australia
View Profile

,
Chen Cai

Data61 CSIRO, Sydney, NSW, Australia

Data61 CSIRO, Sydney, NSW, Australia
View Profile

,
Fang Chen

Data61 CSIRO, Sydney, NSW, Australia

Data61 CSIRO, Sydney, NSW, Australia
View Profile

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementOctober 2018Pages 1543–1546https://doi.org/10.1145/3269206.3269319

Published:17 October 2018Publication History

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 1543–1546

ABSTRACT

Missing values in real world datasets are a common issue. Handling missing values is one of the most key aspects in data mining, as it can seriously impact the performance of predictive models. In this paper we proposed a unified Boosting framework that consolidates model construction and missing value handling. At each Boosting iteration, weights are assigned to both the samples and features. The sample weights make difficult samples become the learning focus, while the feature weights enable critical features to be compensated by less critical features when they are unavailable. A weak classifier that abstains (i.e, produce no prediction when required feature value is missing) is learned on a data subset determined by the feature weights. Experimental results demonstrate the efficacy and robustness of the proposed method over existing Boosting algorithms.

References

Kevin Bache and Moshe Lichman. 2013. UCI machine learning repository.Google Scholar
Haixia Chen, Yuping Du, and Kai Jiang. 2012. Classification of incomplete data using classifier ensembles. In Systems and Informatics (ICSAI), 2012 International Conference on. IEEE, 2229--2232.Google ScholarCross Ref
Yoav Freund and Robert E Schapire. 1996. Experiments with a new boosting algorithm. In ICML, Vol. 96. 148--156. Google ScholarDigital Library
O. L. Mangasarian and W. H. Wolberg. 1990. Cancer diagnosis via linear programming. SIAM News , Vol. 23, 5 (1990), 1 & 18.Google Scholar
Joseph O'Sullivan, John Langford, Rich Caruana, and Avrim Blum. 2000. FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness.. In ICML . 703--710. Google ScholarDigital Library
Robert E Schapire and Yoram Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine learning , Vol. 37, 3 (1999), 297--336. Google ScholarDigital Library
Kristen A Severson, Brinda Monian, J Christopher Love, and Richard D Braatz. 2017. A method for learning a sparse classifier in the presence of missing data for high-dimensional biological datasets. Bioinformatics , Vol. 33, 18 (2017), 2897--2905.Google ScholarCross Ref
Fabrizio Smeraldi, Michael Defoin-Platel, and Mansoor Saqi. 2010. Handling Missing Features with Boosting Algorithms for Protein--Protein Interaction Prediction. In International Conference on Data Integration in the Life Sciences. Springer, 132--147. Google ScholarDigital Library
Geoffrey I Webb. 1998. The problem of missing values in decision tree grafting. In Australian Joint Conference on Artificial Intelligence. Springer, 273--283. Google ScholarDigital Library
Sabit Anwar Zahin, Chowdhury Farhan Ahmed, and Tahira Alam. 2018. An effective method for classification with missing values. Applied Intelligence (2018), 1--22. Google ScholarDigital Library

Index Terms

DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain
1. Computing methodologies
  1. Machine learning

Recommendations

Logistic Model Trees with AUC split criterion for the KDD cup 2009 small challenge
KDD-CUP'09: Proceedings of the 2009 International Conference on KDD-Cup 2009 - Volume 7

In this work, we describe our approach to the "Small Challenge" of the KDD cup 2009, a classification task with incomplete data. Preprocessing, feature extraction and model selection are documented in detail. We suggest a criterion based on the number ...
Read More
Impact of imputation of missing values on classification error for discrete data

Numerous industrial and research databases include missing values. It is not uncommon to encounter databases that have up to a half of the entries missing, making it very difficult to mine them using data analysis methods that can work only with ...
Read More
Missing values: how many can they be to preserve classification reliability?

Using five medical datasets we detected the influence of missing values on true positive rates and classification accuracy. We randomly marked more and more values as missing and tested their effects on classification accuracy. The classifications were ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
boosting
feature weights
missing values
weak classifiers that abstain
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 189
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DualBoost: Handling Missing Values with Feature Weights and Weak Classifiers that Abstain

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Logistic Model Trees with AUC split criterion for the KDD cup 2009 small challenge

Impact of imputation of missing values on classification error for discrete data

Missing values: how many can they be to preserve classification reliability?