skip to main content
10.1145/1143844.1143878acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Efficient learning of Naive Bayes classifiers under class-conditional classification noise

Published:25 June 2006Publication History

ABSTRACT

We address the problem of efficiently learning Naive Bayes classifiers under class-conditional classification noise (CCCN). Naive Bayes classifiers rely on the hypothesis that the distributions associated to each class are product distributions. When data is subject to CCC-noise, these conditional distributions are themselves mixtures of product distributions. We give analytical formulas which makes it possible to identify them from data subject to CCCN. Then, we design a learning algorithm based on these formulas able to learn Naive Bayes classifiers under CCCN. We present results on artificial datasets and datasets extracted from the UCI repository database. These results show that CCCN can be efficiently and successfully handled.

References

  1. DeComité, F., Denis, F., Gilleron, R., & Letouzey, F. (1999). Positive and unlabeled examples help learning. ALT 99, 10th In. Conf. on Algorithmic Learning Theory.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Denis, F., Gilleron, R., Laurent, A., & Tommasi, M. (2003). Text classification and co-training from positive and unlabeled examples. Proc. of the ICML 2003 workshop: The Continuum from Labeled to Unlabeled Data.]]Google ScholarGoogle Scholar
  3. Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103--130.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Feldman, J., O'Donnell, R., & Servedio, R. A. (2005). Learning mixtures of product distributions over discrete domains. Proceedings of FOCS 2005 (pp. 501--510).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Freund, Y., & Mansour, Y. (1999). Estimating a mixture of two product distributions. Proceedings of COLT'99.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Geiger, D., Heckerman, D., King, H., & Meek, C. (2001). Stratified exponential families: graphical models and model selection. Annals of Statistics, 29, 505--529.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. Proceedings of IJCAI 2003.]]Google ScholarGoogle Scholar
  8. Li, X., & Liu, B. (2005). Learning from positive and unlabeled examples with different data distributions. Proceedings of ECML 2005 (pp. 218--229).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Merz, C., & Murphy, P. (1998). UCI repository of machine learning databases.]]Google ScholarGoogle Scholar
  10. Whiley, M., & Titterington, D. (2002). Model identifiability in naive bayesian networks (Technical Report).]]Google ScholarGoogle Scholar
  11. Yakowitz, S. J. & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mat. St., 39.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. Yang, Y., Xia, Y., Chi, Y. & Muntz, R. R. (2003). Learning naive bayes classifier from noisy data. CSD-TR 030056.]]Google ScholarGoogle Scholar
  13. Zhu, X., Wu, X., & Chen, Q. (2003). Eliminating class noise in large datasets. ICML (pp. 920--927).]]Google ScholarGoogle Scholar

Index Terms

  1. Efficient learning of Naive Bayes classifiers under class-conditional classification noise

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '06: Proceedings of the 23rd international conference on Machine learning
      June 2006
      1154 pages
      ISBN:1595933832
      DOI:10.1145/1143844

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 June 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader