skip to main content
research-article

PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining

Published:07 August 2018Publication History
Skip Abstract Section

Abstract

Prior research shows that only a tiny percentage of users actually read the online privacy policies they implicitly agree to while using a website. Prior research also suggests that users ignore privacy policies because these policies are lengthy and, on average, require 2 years of college education to comprehend. We propose a novel technique that tackles this problem by automatically extracting summaries of online privacy policies. We use data mining models to analyze the text of privacy policies and answer 10 basic questions concerning the privacy and security of user data, what information is gathered from them, and how this information is used. In order to train the data mining models, we thoroughly study privacy policies of 400 companies (considering 10% of all listings on NYSE, Nasdaq, and AMEX stock markets) across industries. Our free Chrome browser extension, PrivacyCheck, utilizes the data mining models to summarize any HTML page that contains a privacy policy. PrivacyCheck stands out from currently available counterparts because it is readily applicable on any online privacy policy. Cross-validation results show that PrivacyCheck summaries are accurate 40% to 73% of the time. Over 400 independent Chrome users are currently using PrivacyCheck.

Skip Supplemental Material Section

Supplemental Material

References

  1. Alessandro Acquisti, Curtis Taylor, and Liad Wagman. 2016. The economics of privacy. Journal of Economic Literature 54, 2 (2016), 442--492.Google ScholarGoogle ScholarCross RefCross Ref
  2. AdblockPlus. 2015. Adblock Plus Surf the web without annoying ads! Retrieved June 3, 2015, from https://adblockplus.org/.Google ScholarGoogle Scholar
  3. Waleed Ammar, Shomir Wilson, Norman Sadeh, and Noah A. Smith. 2012. Automatic categorization of privacy policies: A pilot study. Research Showcase @ CMU.Google ScholarGoogle Scholar
  4. AT8T. 2002. Privacy Bird. Retrieved June 15, 2015, from http://www.privacybird.org.Google ScholarGoogle Scholar
  5. BBBOnline. 2015. Better Business Bureau. Retrieved June 15, 2015, from http://www.bbb.org/central-texas/bbb-education-foundation.Google ScholarGoogle Scholar
  6. BuiltWith. 2015. P3P policy usage statistics. Retrieved June 3, 2015, from http://trends.builtwith.com/docinfo/P3P-Policy.Google ScholarGoogle Scholar
  7. Nathan Clarke, Steven Furnell, Julio Angulo, Simone Fischer-Hübner, Erik Wästlund, and Tobias Pulls. 2012. Towards usable privacy policy display and management. Information Management 8 Computer Security 20, 1 (2012), 4--17.Google ScholarGoogle Scholar
  8. Lorrie Faith Cranor. 2012. Necessary but not sufficient: Standardized mechanisms for privacy notice and choice. Journal on Telecommunications 8 High Technology Law 10 (2012), 273.Google ScholarGoogle Scholar
  9. Lorrie Faith Cranor, Praveen Guduru, and Manjula Arjula. 2006a. User interfaces for privacy agents. ACM Transactions on Computer-Human Interaction (TOCHI) 13, 2 (2006), 135--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lorrie Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall, and Joseph Reagle. 2006b. The Platform for Privacy Preferences 1.1 (P3P1.1) Specification.Google ScholarGoogle Scholar
  11. Datanyze. 2015. Truste market share in the Alexa top 1M. Retrieved June 3, 2015, from https://www.datanyze.com/market-share/security/truste-market-share.Google ScholarGoogle Scholar
  12. Tatiana Ermakova, Annika Baumann, Benjamin Fabian, and Hanna Krasnova. 2014. Privacy policies and users’ trust: Does readability matter? In 20th Americas Conference on Information Systems (AMCIS’14).Google ScholarGoogle Scholar
  13. FTC. 2000. Privacy online: Fair information practices in the electronic marketplace: A Federal Trade Commission report to Congress. Retrieved October 21, 2015, from https://www.ftc.gov/reports/privacy-online-fair-information-practices-electronic-marketplace-federal-trade-commission.Google ScholarGoogle Scholar
  14. FTC. 2010. Exploring privacy: An FTC roundtable discussion. Retrieved May 21, 2015, from https://www.ftc.gov/sites/default/files/documents/public_events/exploring-privacy-roundtable-series/privacyroundtable_march2010_transcript.pdf.Google ScholarGoogle Scholar
  15. FTC. 2012. Protecting consumer privacy in an era of rapid change: Recommendations for businesses and policymakers. Retrieved May 21, 2015, from https://www.ftc.gov/reports/protecting-consumer-privacy-era-rapid-change-recommendations-businesses-policymakers.Google ScholarGoogle Scholar
  16. Ghostery. 2015. Join over 40 million Ghostery users and download the web’s most popular privacy tool. Retrieved June 3, 2015, from https://www.ghostery.com/en/home.Google ScholarGoogle Scholar
  17. Google. 2014a. Google Prediction API v 1.6. Retrieved June 3, 2015, from https://cloud.google.com/prediction/docs.Google ScholarGoogle Scholar
  18. Google. 2014b. Google search engine. Retrieved November 13, 2014, from https://www.google.com/?gws_rd=ssl#q=privacy+policy.Google ScholarGoogle Scholar
  19. Mark A. Graber, Donna M. D. Alessandro, and Jill Johnson-West. 2002. Reading level of privacy policies on internet health web sites. Journal of Family Practice 51, 7 (2002), 642--642.Google ScholarGoogle Scholar
  20. ICB. 2006. Industry Classification Benchmark (ICB): A single standard defining the market. Retrieved October 7, 2015, from http://www.icbenchmark.com.Google ScholarGoogle Scholar
  21. Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. 2009. A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security. ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Patrick Gage Kelley, Lucian Cesca, Joanna Bresee, and Lorrie Faith Cranor. 2010. Standardizing privacy notices: An online study of the nutrition label approach. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1573--1582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alfred Kobsa. 2007. Privacy-enhanced web personalization. In The Adaptive Web. Springer, 628--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ron Kohavi. 2001. Mining e-commerce data: The good, the bad, and the ugly. In International Conference on Knowledge Discovery and Data Mining. ACM, 8--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Aleecia M. McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies. I/S: A Journal of Law and Policy for the Information Society 4 (2008), 543.Google ScholarGoogle Scholar
  26. David B. Meinert, Dane K. Peterson, John R. Criswell, and Martin D. Crossland. 2006. Privacy policy statements and consumer willingness to provide personal information. Journal of Electronic Commerce in Organizations 4, 1 (2006), 1.Google ScholarGoogle ScholarCross RefCross Ref
  27. George R. Milne and Mary J. Culnan. 2004. Strategies for reducing online privacy risks: Why consumers read (or don’t read) online privacy notices. Journal of Interactive Marketing 18, 3 (2004), 15--29.Google ScholarGoogle ScholarCross RefCross Ref
  28. George R. Milne, Mary J. Culnan, and Henry Greene. 2006. A longitudinal assessment of online privacy notice readability. Journal of Public Policy 8 Marketing 25, 2 (2006), 238--249.Google ScholarGoogle ScholarCross RefCross Ref
  29. Nasdaq. 2015. Nasdaq. Retrieved September 3, 2015, from http://www.nasdaq.com.Google ScholarGoogle Scholar
  30. Robert W. Reeder, Patrick Gage Kelley, Aleecia M. McDonald, and Lorrie Faith Cranor. 2008. A user study of the expandable grid applied to P3P privacy policy visualization. In Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society. ACM, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Having Regard. 1980. Recommendation of the council concerning guidelines governing the protection of privacy and transborder flows of personal data.Google ScholarGoogle Scholar
  32. Disconnect Me. 2014. Disconnect Me privacy icons. Retrieved March 15, 2016, from https://disconnect.me/icons.Google ScholarGoogle Scholar
  33. Usable Privacy. 2016. Usable Privacy Project website. Retrieved September 28, 2016, from https://usableprivacy.org/.Google ScholarGoogle Scholar
  34. UT CID. 2015. PrivacyCheck. Retrieved May 16, 2016, from https://chrome.google.com/webstore/detail/privacycheck/poobeppenopkcbjejfjenbiepifcbclg.Google ScholarGoogle Scholar
  35. Norman Sadeh, Alessandro Acquisti, Travis D. Breaux, Lorrie Faith Cranor, Aleecia M. McDonald, Joel R. Reidenberg, Noah A. Smith, Fei Liu, N. Cameron Russell, Florian Schaub, and Shomir Wilson. 2013. The Usable Privacy Policy Project. Technical Report, CMU-ISR-13-119, Carnegie Mellon University.Google ScholarGoogle Scholar
  36. Nili Steinfeld. 2016. I agree to the terms and conditions: (How) do users read privacy policies online? An eye-tracking experiment. Computers in Human Behavior 55 (2016), 992--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. ToS;DR. 2012. Terms of Service; Didn’t Read. Retrieved March 4, 2015, from https://tosdr.org.Google ScholarGoogle Scholar
  38. TRUSTe. 2015. TRUSTe. Retrieved March 4, 2015, from http://www.truste.com.Google ScholarGoogle Scholar
  39. Shomir Wilson, Florian Schaub, Aswarth Dara, Sushain K. Cherivirala, Sebastian Zimmeck, Mads Schaarup Andersen, Pedro Giovanni Leon, Eduard Hovy, and Norman Sadeh. 2016a. Demystifying privacy policies using language technologies: Progress and challenges. In LREC Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS’16).Google ScholarGoogle Scholar
  40. Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. 2016b. The creation and analysis of a website privacy policy corpus. In Annual Meeting of the Association for Computational Linguistics. 1330--1340.Google ScholarGoogle ScholarCross RefCross Ref
  41. Shomir Wilson, Florian Schaub, Rohan Ramanath, Norman Sadeh, Fei Liu, Noah A. Smith, and Frederick Liu. 2016c. Crowdsourcing annotations for websites’ privacy policies: Can it really work? In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sebastian Zimmeck. 2014. Privee Chrome extension. Retrieved July 13, 2015, from https://chrome.google.com/webstore/detail/privee/lmhnkfilbojonenmnagllnoiganihmnl.Google ScholarGoogle Scholar
  43. Sebastian Zimmeck and Steven M. Bellovin. 2014. Privee: An architecture for automatically analyzing web privacy policies. In 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 1--16. Retrieved from https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/zimmeck. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 18, Issue 4
        Special Issue on Computational Ethics and Accountability, Special Issue on Economics of Security and Privacy and Regular Papers
        November 2018
        348 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/3210373
        • Editor:
        • Munindar P. Singh
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 August 2018
        • Revised: 1 July 2017
        • Accepted: 1 July 2017
        • Received: 1 October 2016
        Published in toit Volume 18, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader