Abstract
Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. Yet the practice of data analysis is an intrinsically interactive and adaptive process: new analyses and hypotheses are proposed after seeing the results of previous ones, parameters are tuned on the basis of obtained results, and datasets are shared and reused.
In this work, we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. We demonstrate new approaches for addressing the challenges of adaptivity that are based on techniques developed in privacy-preserving data analysis.<!-- END_PAGE_1 -->
As an application of our techniques we give a simple and practical method for reusing a holdout (or testing) set to validate the accuracy of hypotheses produced adaptively by a learning algorithm operating on a training set.
- Aschwanden, C. Science isn't broken.Google Scholar
- Bassily, R., Nissim, K., Smith, A.D., Steinke, T., Stemmer, U., Ullman, J. Algorithmic stability for adaptive data analysis. In STOC, Cambridge, MA, USA (2016), 1046--1059. Google ScholarDigital Library
- Benjamini, Y., Hochberg, Y. Controlling the false discovery rate - A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57 (1995), 289--4300.Google ScholarCross Ref
- Blum, A., Dwork, C., McSherry, F., Nissim, K. Practical privacy: The SuLQ framework. In PODS, Baltimore, Maryland, USA (2005), 128--138. Google ScholarDigital Library
- Blum, A., Hardt, M. The ladder: A reliable leaderboard for machine learning competitions. In ICML, Lille, France (2015), 1006--1014. Google ScholarDigital Library
- Bousquet, O., Elisseeff, A. Stability and generalization. JMLR 2 (2002), 499--526. Google ScholarDigital Library
- Cawley, G.C., Talbot. N.L.C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11 (2010), 2079--2107. Google ScholarDigital Library
- Dwork, C. A firm foundation for private data analysis. CACM 54, 1 (2011), 86--95. Google ScholarDigital Library
- Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A. Preserving statistical validity in adaptive data analysis. CoRR, abs/1411.2664, 2014. Extended abstract in STOC 2015. Google ScholarDigital Library
- Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A. Generalization in adaptive data analysis and holdout reuse. CoRR, abs/1506, 2015. Extended abstract in NIPS 2015. Google ScholarDigital Library
- Dwork, C., McSherry, F., Nissim, K., Smith, A. Calibrating noise to sensitivity in private data analysis. In TCC, New York, NY, USA (2006), 265--284. Google ScholarDigital Library
- Dwork, C., Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 34 (2014), 211--407. Google ScholarDigital Library
- Freedman, D.A. A note on screening regression equations. Am. Statist. 37, 2 (1983), 152--155.Google Scholar
- Gelman, A., Loken, E. The statistical crisis in science. Am. Statist. 102, 6 (2014), 460.Google Scholar
- Hardt, M., Rothblum, G. A multiplicative weights mechanism for privacy-preserving data analysis. In FOCS, Las Vegas, Nevada, USA (2010), 61--70. Google ScholarDigital Library
- Hardt, M., Ullman, J. Preventing false discovery in interactive data analysis is hard. In FOCS, Philadelphia, PA, USA (2014), 454--463. Google ScholarDigital Library
- Hastie, T., Tibshirani, R., Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer-Verlag, New York (2009).Google Scholar
- Ioannidis, J.P.A. Why most published research findings are false. PLoS Med. 2, 8 (Aug. 2005), 124.Google ScholarCross Ref
- Kearns, M. Efficient noise-tolerant learning from statistical queries. J. ACM 45, 6 (1998), 983--1006. Google ScholarDigital Library
- Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R. Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25, 1-3 (2006), 161--193.Google ScholarCross Ref
- Nissim, K., Stemmer, U. On the generalization properties of differential privacy. CoRR (2015), abs/1504.05800.Google Scholar
- Reunanen, J. Overfitting in making comparisons between variable selection methods. J. Mach Learn Res. 3 (2003), 1371--1382. Google ScholarDigital Library
- Shalev-Shwartz, S., Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 32 Avenue of the Americas, New York, NY 10013-2473, USA (2014). Google ScholarDigital Library
- Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K. Learnability, stability and uniform convergence. J. Mach Learn Res. 11 (2010), 2635--2670. Google ScholarDigital Library
- Steinke, T., Ullman, J. Interactive fingerprinting codes and the hardness of preventing false discovery. In COLT, Paris, France (2015), 1588--1628.Google Scholar
Index Terms
- Guilt-free data reuse
Recommendations
Viewpoint-centred reuse: bridging the gap between reusability and the needs of the reuser
The "Reuse without Reusers" SyndromeThe problems of low productivity, late delivery and poor software quality are, unfortunately, characteristic of the software industry (Horowitz and Munson 1989). Increasingly, software developers are looking towards ...
Reuse repositories and reuse: the realities
OOPSLA '03: Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsThis panel (part of the 2003 Onward! program) will discuss libraries, repositories, and reuse. While there is so much hype and noise about repositories and metadata - so little understood about how hard it is to design for reuse and to encourage ...
Unanticipated reuse of large-scale software features
ICSE '06: Proceedings of the 28th international conference on Software engineeringSoftware reuse has been endorsed as a way to reduce development times and costs while increasing software quality and reliability. Techniques designed to encourage software reuse have concentrated on creating reusable software in the form of frameworks, ...
Comments