ABSTRACT
Noise in class labels of any training set can lead to poor classification results no matter what machine learning method is used. In this paper, we first present the problem of binary classification in the presence of random noise on the class labels, which we call class noise. To model class noise, a class noise rate is normally defined as a small independent probability of the class labels being inverted on the whole set of training data. In this paper, we propose a method to estimate class noise rate at the level of individual samples in real data. Based on the estimation result, we propose two approaches to handle class noise. The first technique is based on modifying a given surrogate loss function. The second technique eliminates class noise by sampling. Furthermore, we prove that the optimal hypothesis on the noisy distribution can approximate the optimal hypothesis on the clean distribution using both approaches. Our methods achieve over 87% accuracy on a synthetic non-separable dataset even when 40% of the labels are inverted. Comparisons to other algorithms show that our methods outperform state-of-the-art approaches on several benchmark datasets in different domains with different noise rates.
- Zhu, X., and Wu, X "Class noise vs. attribute noise: A quantitative study." In Artificial Intelligence Review. 22(3): 177--210, 2004. Google ScholarDigital Library
- Sáez, J. A., Galar, M, Luengo, J, and Herrera, F. "Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition." In Knowledge and Information Systems. 38(1): 179--206, 2014.Google ScholarCross Ref
- Joseph, L., Gyorkos, T. W., and Coupal, L.. "Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard." In American Journal of Epidemiology. 3: 263--272, 1995.Google ScholarCross Ref
- Cawthorn, D. M., Steinman, H. A., and Hoffman, L. C.. "A High Incidence of Species Substitution and Mislabelling Detected in Meat Products Sold in South Africa." In Food Control. 32(2): 440--449, 2013.Google ScholarCross Ref
- Beigman, E. and Klebanov, B. B.. "Learning with Annotation Noise". In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, 280--287, 2009. Google ScholarDigital Library
- Natarajan, N., Dhillon, I. S., and Ravikumar, P.. "Learning with Noisy Labels". In Proceeding of Advances in Neural Information Processing Systems. 2013.Google Scholar
- Brodley, C. E., and Friedl, M. A.. "Identifying mislabeled training data." In Journal of Artificial Intelligence Research. 11: 131--167, 1999.Google ScholarCross Ref
- Zighed, D.A., Lallich, S., Muhlenbach, F.. "A Statistical Approach to Class Separability". In Applied Stochastic Models in Business and Industry, Wiley-Blackwell, 21 (2): 187--197, 2005. Google ScholarDigital Library
- Sluban, B., Gamberger, D., and Lavrac, N.. "Advances in Class Noise Detection." In Proceeding of European Conference on Artificial Intelligence, 1105--1106. 2010. Google ScholarDigital Library
- Montgomery-Smith, S. J. "The distribution of Rademacher Sums." In Proceeding of the American Mathematical Society. 109(2): 517--522, 1990.Google ScholarCross Ref
- Angluin, D., and D.Laird, P. "Learning from Noisy Examples." In Machine Learning 2(4): 343--370, 1988 Google ScholarDigital Library
- Zhang, M. L., and Zhou, Z. H.. "CoTrade: Confident Co-Training with Data Editing." In Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions. 41(6): 1612--1626, 2011. Google ScholarDigital Library
- Gui, L., Xu, R. F., Lu, Q., et al. "Cross-lingual Opinion Analysis via Negative Transfer Detection." In Proceedings of the 52th Annual Meeting of the ACL. 860--865, 2014.Google Scholar
- Frénay, B., and Verleysen, M.. "Classification in the Presence of Label Noise: a Survey". In IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, 5, 2014.Google ScholarCross Ref
- Heskes, T. "The use of being Stubborn and Introspective," In Studies in Cognitive Systems. 2000.Google Scholar
- Li, Y., Wessels, L. F. A., Ridder, D., and Reinders, M. J. T.. "Classification in the presence of class noise using a probabilistic Kernel Fisher method". In Pattern Recognition, Volume 40, Issue 12, December 2007, Pages 3349--3357. Google ScholarDigital Library
- Scott, C., Blanchard. G., and Handy, G.. "Classification with Asymmetric Label Noise: Consistency and Maximal Denoising". In Journal of Machine Learning Research: Workshop and Conference Proceedings vol 30 (2013) 1--23Google Scholar
- Lawrence, N. D., and Schölkopf, B.. "Estimating a Kernel Fisher Discriminant in the Presence of Label Noise," In Proceeding of International Conference on Machine Learning. 306--313, 2001. Google ScholarDigital Library
- Perez, C. J., Giron, F. J., Martin, J., Ruiz, M., and Rojano, C.. "Misclassified Multinomial Data: A Bayesian Approach," Revista De La Real Academia De Ciencias Exactas Físicas Y Naturales Serie A Matemáticas, vol. 101, no. 1, 71--80, 2007.Google Scholar
- Klebanov, B. B., and Beigman, E.. "From Annotator Agreement to Noise Models," In Computational. Linguistics, vol. 35, no. 4, 495--503, 2009. Google ScholarDigital Library
- Kolcz, A., and Cormack, G. V.. "Genre-based Decomposition of Email Class Noise," In Proceeding of 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 427--436, 2009. Google ScholarDigital Library
- Zhu, X., Wu, X., and Chen, Q. J.. "Eliminating Class Noise in Large Datasets." In Proceeding of International Conference on Machine Learning, vol. 3, 920--927. 2003.Google Scholar
- Jiang, Y., and Zhou, Z. H.. "Editing Training Data for k-NN Classifiers with Neural Network Ensemble." In Advances in Neural Networks, 356--361. Springer Berlin Heidelberg, 2004.Google Scholar
- Bennett, C., and Sharpley, M.. "Interpolation of Operators". Vol. 129. Academic press, 1988. Google ScholarDigital Library
- Golub, T. R., Donna K. S., Pablo Tamayo, C. H., Michelle G., Jill, P. M., Hilary C.. "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring." Science. 286(5439): 531--537, 1999.Google ScholarCross Ref
- Platt, J. C. "Fast Training of Support Vector Machines using Sequential Minimal Optimization". In Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. Google ScholarDigital Library
- Crammer, K., and Lee, D.. "Learning via Gaussian Herding." In Proceeding of Advances in Neural Information Processing Systems, 451--459. 2010.Google Scholar
- Cui, B., Ooi, B. C., Su, J., and Tan, K. L. "Contorting high dimensional data for efficient main memory KNN processing". In Proceeding of International Conference on Management of Data - SIGMOD, pp. 479--490, 2003. Google ScholarDigital Library
- Hui, J., Ooi, B.C., Shen, H., Yu, C., Zhou, A.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: Proc. 19th ICDE Conference, p. 87 2003.Google Scholar
Recommendations
A First Study on the Use of Noise Filtering to Clean the Bags in Multi-Instance Classification
LOPAL '18: Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and ApplicationsData in the real world is far from being perfect. The appearance of noise is a common issue that arises from the limitations of data adquisition mechanisms and human knowledge. In classification, label noise will hinder the performance of any classifier,...
Noise correction to improve data and model quality for crowdsourcing
AbstractIn supervised learning, obtaining expert labeling of data is expensive and time-consuming in many cases. Crowdsourcing services provide a cheap and efficient way to acquire the labels of data. In crowdsourcing scenario, each instance ...
Highlights- There are few works on noise handling techniques to improve crowdsourcing learning.
Class noise vs. attribute noise: a quantitative study of their impacts
Real-world data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Noise can reduce system performance in terms of classification ...
Comments