ABSTRACT
We address the problem of efficiently learning Naive Bayes classifiers under class-conditional classification noise (CCCN). Naive Bayes classifiers rely on the hypothesis that the distributions associated to each class are product distributions. When data is subject to CCC-noise, these conditional distributions are themselves mixtures of product distributions. We give analytical formulas which makes it possible to identify them from data subject to CCCN. Then, we design a learning algorithm based on these formulas able to learn Naive Bayes classifiers under CCCN. We present results on artificial datasets and datasets extracted from the UCI repository database. These results show that CCCN can be efficiently and successfully handled.
- DeComité, F., Denis, F., Gilleron, R., & Letouzey, F. (1999). Positive and unlabeled examples help learning. ALT 99, 10th In. Conf. on Algorithmic Learning Theory.]] Google ScholarDigital Library
- Denis, F., Gilleron, R., Laurent, A., & Tommasi, M. (2003). Text classification and co-training from positive and unlabeled examples. Proc. of the ICML 2003 workshop: The Continuum from Labeled to Unlabeled Data.]]Google Scholar
- Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103--130.]] Google ScholarDigital Library
- Feldman, J., O'Donnell, R., & Servedio, R. A. (2005). Learning mixtures of product distributions over discrete domains. Proceedings of FOCS 2005 (pp. 501--510).]] Google ScholarDigital Library
- Freund, Y., & Mansour, Y. (1999). Estimating a mixture of two product distributions. Proceedings of COLT'99.]] Google ScholarDigital Library
- Geiger, D., Heckerman, D., King, H., & Meek, C. (2001). Stratified exponential families: graphical models and model selection. Annals of Statistics, 29, 505--529.]]Google ScholarCross Ref
- Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. Proceedings of IJCAI 2003.]]Google Scholar
- Li, X., & Liu, B. (2005). Learning from positive and unlabeled examples with different data distributions. Proceedings of ECML 2005 (pp. 218--229).]] Google ScholarDigital Library
- Merz, C., & Murphy, P. (1998). UCI repository of machine learning databases.]]Google Scholar
- Whiley, M., & Titterington, D. (2002). Model identifiability in naive bayesian networks (Technical Report).]]Google Scholar
- Yakowitz, S. J. & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mat. St., 39.]]Google ScholarCross Ref
- Yang, Y., Xia, Y., Chi, Y. & Muntz, R. R. (2003). Learning naive bayes classifier from noisy data. CSD-TR 030056.]]Google Scholar
- Zhu, X., Wu, X., & Chen, Q. (2003). Eliminating class noise in large datasets. ICML (pp. 920--927).]]Google Scholar
Index Terms
- Efficient learning of Naive Bayes classifiers under class-conditional classification noise
Recommendations
Learning Naive Bayes Classifiers for Music Classification and Retrieval
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern RecognitionIn this paper, we explore the use of naive Bayes classifiers for music classification and retrieval. The motivation is to employ all audio features extracted from local windows for classification instead of just using a single song-level feature vector ...
A comprehensive review of recursive Naïve Bayes Classifiers
In this paper we provide a comprehensive empirical review of a variant of the Recursive Naïve Baye Classifier (RNBC*) in comparison to simple Naïve Bayes and C4.5. We show that in terms of a zero one loss cost function for classification accuracy, RNBC* ...
Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks
In this paper, we introduce two independent hybrid mining algorithms to improve the classification accuracy rates of decision tree (DT) and naive Bayes (NB) classifiers for the classification of multi-class problems. Both DT and NB classifiers are ...
Comments