skip to main content
research-article

Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

Published:09 November 2010Publication History
Skip Abstract Section

Abstract

Cross-validation is a mainstay for measuring performance and progress in machine learning. There are subtle differences in how exactly to compute accuracy, F-measure and Area Under the ROC Curve (AUC) in cross-validation studies. However, these details are not discussed in the literature, and incompatible methods are used by various papers and software packages. This leads to inconsistency across the research literature. Anomalies in performance calculations for particular folds and situations go undiscovered when they are buried in aggregated results over many folds and datasets, without ever a person looking at the intermediate performance measurements. This research note clarifies and illustrates the differences, and it provides guidance for how best to measure classification performance under cross-validation. In particular, there are several divergent methods used for computing F-measure, which is often recommended as a performance measure under class imbalance, e.g., for text classification domains and in one-vs.-all reductions of datasets having many classes. We show by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance. This paper is of particular interest to those designing machine learning software libraries and researchers focused on high class imbalance.

References

  1. G. Forman. BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In Proceedings of the 17th ACM Conference onInformation and Knowledge Management (CIKM), pages 263--270, New York, NY, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Caasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, pages 81--93, Las Vegas, NV, Apr. 1994. ISRI; Univ. of Nevada, Las Vegas.Google ScholarGoogle Scholar
  5. D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. volume 5, pages 361--397, 2004. http://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad, editors, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935--940, New York, NY, USA, August 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Raeder, G. Forman, and N. V. Chawla. Data Mining: Foundations and Intelligent Paradigms, chapter Learning with Imbalanced Data: Evaluation Matters. Intelligent Systems Reference Library. Springer Verlag, 2010.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGKDD Explorations Newsletter
    ACM SIGKDD Explorations Newsletter  Volume 12, Issue 1
    June 2010
    77 pages
    ISSN:1931-0145
    EISSN:1931-0153
    DOI:10.1145/1882471
    Issue’s Table of Contents

    Copyright © 2010 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 November 2010

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader