research-article

Nonparametric estimation of the precision-recall curve

Authors:
Stéphan Clémençon

LTCI UMR Telecom ParisTech/CNRS, Paris Cedex, France

LTCI UMR Telecom ParisTech/CNRS, Paris Cedex, France
View Profile

,
Nicolas Vayatis

CMLA UMR CNRS & UniverSud, Cachan Cedex, France

CMLA UMR CNRS & UniverSud, Cachan Cedex, France
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 185–192https://doi.org/10.1145/1553374.1553398

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 185–192

ABSTRACT

The Precision-Recall (PR) curve is a widely used visual tool to evaluate the performance of scoring functions in regards to their capacities to discriminate between two populations. The purpose of this paper is to examine both theoretical and practical issues related to the statistical estimation of PR curves based on classification data. Consistency and asymptotic normality of the empirical counterpart of the PR curve in sup norm are rigorously established. Eventually, the issue of building confidence bands in the PR space is considered and a specific resampling procedure based on a smoothed and truncated version of the empirical distribution of the data is promoted. Arguments of theoretical and computational nature are presented to explain why such a bootstrap is preferable to a "naive" bootstrap in this setup.

References

Bertail, P., Clémençon, S., & Vayatis, N. (2008). On bootstrapping the ROC curve. In Proc. of Neur. Inf. Proc. Syst. 2008, Vancouver, Canada.Google Scholar
Bucklew, J. (2003). Introduction to rare event simulation. Springer.Google Scholar
Clémençon, S., & Vayatis, N. (2008). Tree-structured ranking rules and approximation of the optimal ROC curve. Proceedings of the 2008 conference on Algorithmic Learning Theory. Lect. Notes Art. Int. 5254, pp. 22--37, Springer. Google ScholarDigital Library
Csorgo, M., & Revesz, P. (1981). Strong approximations in probability and statistics. Academic Press.Google Scholar
Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the 23 rd International Conference on Machine Learning, Vol. 148, pp. 233--240. Google ScholarDigital Library
Efron, B. (1979). Bootstrap methods: another look at the jacknife. Annals of Statistics, 7, 1--26.Google ScholarCross Ref
Falk, M., & Reiss, R. (1989). Weak convergence of smoothed and nonsmoothed bootstrap quantile estimates. Annals of Probability, 17, 362--371.Google ScholarCross Ref
Giné, E., & Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Poincaré (B), Probabilités et Statistiques, 38, 907--921.Google ScholarCross Ref
Horvath, L., Horvath, Z., & Zhou (2008). Confidence bands for ROC curves. Journal of Statistical Planning and Inference, 138, 1894--1904.Google ScholarCross Ref
Hsieh, F., & Turnbull, B. (1996). Nonparametric and semi-parametric statistical estimation of the ROC curve. The Annals of Statistics, 24, 25--40.Google ScholarCross Ref
Macskassy, S., & Provost, F. (2004). Confidence bands for ROC curves: methods and an empirical study. In Proceedings of the first Workshop on ROC Analysis in Artif. Int. at Eur. Conf. on Artif. Int. 2004.Google Scholar
Macskassy, S., Provost, F., & Rosset, S. (2005). Bootstrapping the ROC curve: an empirical evaluation. In Proceedings of Int. Conf. Mach. Learn.-2005 Workshop on ROC Analysis in Machine Learning. Google ScholarDigital Library
Manning, C. M., & Schutze, H. (1999). Foundations of statistical natural language processing. MIT Press. Google ScholarDigital Library
Raghavan, V., Bollmann, P., & Jung, G. (1989). A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7, 205--229. Google ScholarDigital Library
Shao, G., & Tu, J. (1995). The jackknife and bootstrap. Springer, NY.Google Scholar
Shorack, G., & Wellner, J. (1986). Empirical processes with applications to statistics. Wiley, NY.Google Scholar
Silverman, B., & Young, G. (1987). The bootstrap: to smooth or not to smooth? Biometrika, 74, 469--479.Google ScholarCross Ref

Index Terms

Nonparametric estimation of the precision-recall curve
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
2. Mathematics of computing
  1. Probability and statistics

Recommendations

The relationship between Precision-Recall and ROC curves
ICML '06: Proceedings of the 23rd international conference on Machine learning

Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an ...
Read More
On the Null Distribution of the Precision and Recall Curve
Machine Learning and Knowledge Discovery in Databases
Abstract
Precision recall curves (pr-curves) and the associated area under (AUPRC) are commonly used to assess the accuracy of information retrieval (IR) algorithms. An informative baseline is random selection. The associated probability distribution makes ...
Read More
Nonparametric curve estimation and bootstrap bandwidth selection
Abstract
Over the last four decades, the bootstrap method has been considered so as to define data‐driven bandwidth selectors for nonparametric curve estimation. An extensive and updated review of bootstrap methods used to select the smoothing parameter ...
The bootstrap method can be used for bandwidth selection in nonparametric curve estimation. image image
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 402
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Nonparametric estimation of the precision-recall curve

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

The relationship between Precision-Recall and ROC curves

On the Null Distribution of the Precision and Recall Curve

Nonparametric curve estimation and bootstrap bandwidth selection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Nonparametric estimation of the precision-recall curve

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

The relationship between Precision-Recall and ROC curves

On the Null Distribution of the Precision and Recall Curve

Nonparametric curve estimation and bootstrap bandwidth selection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media