Abstract
Differential privacy is a widely studied theory for analyzing sensitive data with a strong privacy guarantee—any change in an individual's data can have only a small statistical effect on the result—and a growing number of programming languages now support differentially private data analysis. A common shortcoming of these languages is poor support for adaptivity. In practice, a data analyst rarely wants to run just one function over a sensitive database, nor even a predetermined sequence of functions with fixed privacy parameters; rather, she wants to engage in an interaction where, at each step, both the choice of the next function and its privacy parameters are informed by the results of prior functions. Existing languages support this scenario using a simple composition theorem, which often gives rather loose bounds on the actual privacy cost of composite functions, substantially reducing how much computation can be performed within a given privacy budget. The theory of differential privacy includes other theorems with much better bounds, but these have not yet been incorporated into programming languages.
We propose a novel framework for adaptive composition that is elegant, practical, and implementable. It consists of a reformulation based on typed functional programming of privacy filters, together with a concrete realization of this framework in the design and implementation of a new language, called Adaptive Fuzz. Adaptive Fuzz transplants the core static type system of Fuzz to the adaptive setting by wrapping the Fuzz typechecker and runtime system in an outer adaptive layer, allowing Fuzz programs to be conveniently constructed and typechecked on the fly. We describe an interpreter for Adaptive Fuzz and report results from two case studies demonstrating its effectiveness for implementing common statistical algorithms over real data sets.
Supplemental Material
Available for Download
Contained here are the source files for the Adaptive Fuzz language along with some simple libraries, examples, and tests. Adaptive Fuzz requires OCaml and is currently only supported on Ubuntu 16.04.2 LTS (but may work in other environments). After installing the language, the user will be able to run tests and examples, including the case studies described in the paper.
- Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving Data Mining. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD ’00). ACM, New York, NY, USA, 439–450. Google ScholarDigital Library
- Marc Andrysco, David Kohlbrenner, Keaton Mowery, Ranjit Jhala, Sorin Lerner, and Hovav Shacham. 2015. On Subnormal Floating Point and Abnormal Timing. In Proceedings of the 2015 IEEE Symposium on Security and Privacy (SP ’15). IEEE Computer Society, Washington, DC, USA, 623–639. Google ScholarDigital Library
- Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, and Pierre-Yves Strub. 2015. Higher-order approximate relational refinement types for mechanism design and differential privacy. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 55–68. Google ScholarDigital Library
- Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. 2012. Probabilistic Relational Reasoning for Differential Privacy. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12), Vol. 47. ACM, ACM, New York, NY, USA, 97–110. Google ScholarDigital Library
- Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014. IEEE Computer Society, 464–473. Google ScholarDigital Library
- Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006a. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology-EUROCRYPT 2006. Springer, 486–503. Google ScholarDigital Library
- Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 371–380. Google ScholarDigital Library
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006b. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography. Springer, 265–284. Google ScholarDigital Library
- Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 (2014), 211–407.Google ScholarDigital Library
- Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on. IEEE, 51–60.Google ScholarDigital Library
- Hamid Ebadi and David Sands. 2015. Featherweight PINQ. CoRR abs/1505.02642 (2015). http://arxiv .org/abs/1505.02642Google Scholar
- Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, and Johannes Gehrke. 2002. Privacy Preserving Mining of Association Rules. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). ACM, New York, NY, USA, 217–228. Google ScholarDigital Library
- Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C Pierce. 2013. Linear dependent types for differential privacy. In ACM SIGPLAN Notices, Vol. 48. ACM, 357–370.Google ScholarDigital Library
- Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. Working Paper. PSI (Îĺ): a Private data Sharing Interface. In Theory and Practice of Differential Privacy. New York, NY.Google Scholar
- Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, and Adam Smith. 2008. Composition Attacks and Auxiliary Information in Data Privacy. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’08). ACM, New York, NY, USA, 265–273. Google ScholarDigital Library
- Andreas Haeberlen, Benjamin C. Pierce, and Arjun Narayan. 2011. Differential Privacy Under Fire. In Proceedings of the 20th USENIX Conference on Security (SEC’11). USENIX Association, Berkeley, CA, USA, 33–33. http://dl .acm.org/ citation .cfm?id=2028067.2028100Google ScholarDigital Library
- J. Hsu, M. Gaboardi, A. Haeberlen, S. Khanna, A. Narayan, B. C. Pierce, and A. Roth. 2014. Differential Privacy: An Economic Method for Choosing Epsilon. In 2014 IEEE 27th Computer Security Foundations Symposium. 398–410. Google ScholarDigital Library
- Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving Data Exploration in Genome-wide Association Studies. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13). ACM, New York, NY, USA, 1079–1087. Google ScholarDigital Library
- Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2015. The Composition Theorem for Differential Privacy. In Proceedings of The 32nd International Conference on Machine Learning. 1376–1385.Google Scholar
- Shiva P Kasiviswanathan and Adam Smith. 2014. On the’Semantics’ of Differential Privacy: A Bayesian Formulation. Journal of Privacy and Confidentiality 6, 1 (2014), 1.Google ScholarCross Ref
- Daniel Kifer. 2009. Attacks on Privacy and deFinetti’s Theorem. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD ’09). ACM, New York, NY, USA, 127–138. Google ScholarDigital Library
- Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In 2007 IEEE 23rd International Conference on Data Engineering. 106–115. Google ScholarCross Ref
- Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. 2006. ℓ-Diversity: Privacy Beyond κ-Anonymity. In Proceedings of the 22Nd International Conference on Data Engineering (ICDE ’06). IEEE Computer Society, Washington, DC, USA, 24–. Google ScholarDigital Library
- Frank McSherry and Ilya Mironov. 2009. Differentially Private Recommender Systems: Building Privacy into the Net. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’09). ACM, New York, NY, USA, 627–636. Google ScholarDigital Library
- Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, 19–30.Google ScholarDigital Library
- Elinor Mills. 2006. AOL sued over Web search data release. (Sept. 2006). cnet, http://www .cnet.com/news/aol-sued-overweb- search- data- release/ .Google Scholar
- Ilya Mironov. 2012. On Significance of the Least Significant Bits for Differential Privacy. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS ’12). ACM, New York, NY, USA, 650–661. Google ScholarDigital Library
- Prashanth Mohan, Abhradeep Thakurta, Elaine Shi, Dawn Song, and David Culler. 2012. GUPT: Privacy Preserving Data Analysis Made Easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). ACM, New York, NY, USA, 349–360. Google ScholarDigital Library
- Jack Murtagh and Salil Vadhan. 2016. The Complexity of Computing the Optimal Composition of Differential Privacy. In Theory of Cryptography. Springer, 157–175. Google ScholarDigital Library
- Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private Join Queries over Distributed Databases.. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX, Hollywood, CA, 149–162. https://www .usenix.org/conference/osdi12/technical-sessions/presentation/narayanGoogle Scholar
- Arvind Narayanan and Vitaly Shmatikov. 2008. Robust De-anonymization of Large Sparse Datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy. 111–125. Google ScholarDigital Library
- Jason Reed and Benjamin C. Pierce. 2010. Distance Makes the Types Grow Stronger: A Calculus for Differential Privacy. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP ’10). ACM, New York, NY, USA, 157–168. Google ScholarDigital Library
- Ryan M. Rogers, Aaron Roth, Jonathan Ullman, and Salil P. Vadhan. 2016. Privacy Odometers and Filters: Pay-as-you-Go Composition. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 1921–1929.Google Scholar
- Indrajit Roy, Srinath T. V. Setty, Ann Kilzer, Vitaly Shmatikov, and Emmett Witchel. 2010. Airavat: Security and Privacy for MapReduce. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10), Vol. 10. USENIX Association, Berkeley, CA, USA, 297–312. http://dl .acm.org/citation.cfm?id=1855711.1855731Google Scholar
- Ryan Singel. 2009. Netflix Spilled Your “Brokeback Mountain” Secret, Lawsuit Claims. (Dec. 2009). Wired, http:// www .wired.com/2009/12/netflix-privacy-lawsuit/ .Google Scholar
- Latanya Sweeney. 2002. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (Oct. 2002), 557–570. Google ScholarDigital Library
- Ryan J. Tibshirani. 2015. A General Framework for Fast Stagewise Algorithms. The Journal of Machine Learning Research 16, 1 (Jan. 2015), 2543–2588. http://dl .acm.org/citation.cfm?id=2789272.2912080Google Scholar
- UCI KDD Archive. 2016. 1990 US Census Data Set. (2016). Available from https://kdd .ics.uci.edu/databases/census1990/ USCensus1990raw .html .Google Scholar
- Philip Wadler. 1990. Linear Types Can Change the World. In IFIP TC 2 Working Conference on Programming Concepts and Methods, Sea of Galilee, Israel. 546–566.Google Scholar
Index Terms
- A framework for adaptive differential privacy
Recommendations
A privacy framework: indistinguishable privacy
EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 WorkshopsIn this paper we illustrate a privacy framework named Indistinguishable Privacy. Indistinguishable privacy could be deemed as the formalization of the existing privacy definitions in privacy preserving data publishing as well as secure multi-party ...
A Novel Differential Privacy Approach that Enhances Classification Accuracy
C3S2E '16: Proceedings of the Ninth International C* Conference on Computer Science & Software EngineeringIn the recent past, there has been a tremendous increase of large repositories of data, examples being in healthcare data, consumer data from retailers, and airline passenger data. These data are continually being shared with interested parties, either ...
Differential Privacy: Now it's Getting Personal
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesDifferential privacy provides a way to get useful information about sensitive data without revealing much about any one individual. It enjoys many nice compositionality properties not shared by other approaches to privacy, including, in particular, ...
Comments