Abstract
Prior research shows that only a tiny percentage of users actually read the online privacy policies they implicitly agree to while using a website. Prior research also suggests that users ignore privacy policies because these policies are lengthy and, on average, require 2 years of college education to comprehend. We propose a novel technique that tackles this problem by automatically extracting summaries of online privacy policies. We use data mining models to analyze the text of privacy policies and answer 10 basic questions concerning the privacy and security of user data, what information is gathered from them, and how this information is used. In order to train the data mining models, we thoroughly study privacy policies of 400 companies (considering 10% of all listings on NYSE, Nasdaq, and AMEX stock markets) across industries. Our free Chrome browser extension, PrivacyCheck, utilizes the data mining models to summarize any HTML page that contains a privacy policy. PrivacyCheck stands out from currently available counterparts because it is readily applicable on any online privacy policy. Cross-validation results show that PrivacyCheck summaries are accurate 40% to 73% of the time. Over 400 independent Chrome users are currently using PrivacyCheck.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining
- Alessandro Acquisti, Curtis Taylor, and Liad Wagman. 2016. The economics of privacy. Journal of Economic Literature 54, 2 (2016), 442--492.Google ScholarCross Ref
- AdblockPlus. 2015. Adblock Plus Surf the web without annoying ads! Retrieved June 3, 2015, from https://adblockplus.org/.Google Scholar
- Waleed Ammar, Shomir Wilson, Norman Sadeh, and Noah A. Smith. 2012. Automatic categorization of privacy policies: A pilot study. Research Showcase @ CMU.Google Scholar
- AT8T. 2002. Privacy Bird. Retrieved June 15, 2015, from http://www.privacybird.org.Google Scholar
- BBBOnline. 2015. Better Business Bureau. Retrieved June 15, 2015, from http://www.bbb.org/central-texas/bbb-education-foundation.Google Scholar
- BuiltWith. 2015. P3P policy usage statistics. Retrieved June 3, 2015, from http://trends.builtwith.com/docinfo/P3P-Policy.Google Scholar
- Nathan Clarke, Steven Furnell, Julio Angulo, Simone Fischer-Hübner, Erik Wästlund, and Tobias Pulls. 2012. Towards usable privacy policy display and management. Information Management 8 Computer Security 20, 1 (2012), 4--17.Google Scholar
- Lorrie Faith Cranor. 2012. Necessary but not sufficient: Standardized mechanisms for privacy notice and choice. Journal on Telecommunications 8 High Technology Law 10 (2012), 273.Google Scholar
- Lorrie Faith Cranor, Praveen Guduru, and Manjula Arjula. 2006a. User interfaces for privacy agents. ACM Transactions on Computer-Human Interaction (TOCHI) 13, 2 (2006), 135--178. Google ScholarDigital Library
- Lorrie Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall, and Joseph Reagle. 2006b. The Platform for Privacy Preferences 1.1 (P3P1.1) Specification.Google Scholar
- Datanyze. 2015. Truste market share in the Alexa top 1M. Retrieved June 3, 2015, from https://www.datanyze.com/market-share/security/truste-market-share.Google Scholar
- Tatiana Ermakova, Annika Baumann, Benjamin Fabian, and Hanna Krasnova. 2014. Privacy policies and users’ trust: Does readability matter? In 20th Americas Conference on Information Systems (AMCIS’14).Google Scholar
- FTC. 2000. Privacy online: Fair information practices in the electronic marketplace: A Federal Trade Commission report to Congress. Retrieved October 21, 2015, from https://www.ftc.gov/reports/privacy-online-fair-information-practices-electronic-marketplace-federal-trade-commission.Google Scholar
- FTC. 2010. Exploring privacy: An FTC roundtable discussion. Retrieved May 21, 2015, from https://www.ftc.gov/sites/default/files/documents/public_events/exploring-privacy-roundtable-series/privacyroundtable_march2010_transcript.pdf.Google Scholar
- FTC. 2012. Protecting consumer privacy in an era of rapid change: Recommendations for businesses and policymakers. Retrieved May 21, 2015, from https://www.ftc.gov/reports/protecting-consumer-privacy-era-rapid-change-recommendations-businesses-policymakers.Google Scholar
- Ghostery. 2015. Join over 40 million Ghostery users and download the web’s most popular privacy tool. Retrieved June 3, 2015, from https://www.ghostery.com/en/home.Google Scholar
- Google. 2014a. Google Prediction API v 1.6. Retrieved June 3, 2015, from https://cloud.google.com/prediction/docs.Google Scholar
- Google. 2014b. Google search engine. Retrieved November 13, 2014, from https://www.google.com/?gws_rd=ssl#q=privacy+policy.Google Scholar
- Mark A. Graber, Donna M. D. Alessandro, and Jill Johnson-West. 2002. Reading level of privacy policies on internet health web sites. Journal of Family Practice 51, 7 (2002), 642--642.Google Scholar
- ICB. 2006. Industry Classification Benchmark (ICB): A single standard defining the market. Retrieved October 7, 2015, from http://www.icbenchmark.com.Google Scholar
- Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. 2009. A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security. ACM, 4. Google ScholarDigital Library
- Patrick Gage Kelley, Lucian Cesca, Joanna Bresee, and Lorrie Faith Cranor. 2010. Standardizing privacy notices: An online study of the nutrition label approach. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1573--1582. Google ScholarDigital Library
- Alfred Kobsa. 2007. Privacy-enhanced web personalization. In The Adaptive Web. Springer, 628--670. Google ScholarDigital Library
- Ron Kohavi. 2001. Mining e-commerce data: The good, the bad, and the ugly. In International Conference on Knowledge Discovery and Data Mining. ACM, 8--13. Google ScholarDigital Library
- Aleecia M. McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies. I/S: A Journal of Law and Policy for the Information Society 4 (2008), 543.Google Scholar
- David B. Meinert, Dane K. Peterson, John R. Criswell, and Martin D. Crossland. 2006. Privacy policy statements and consumer willingness to provide personal information. Journal of Electronic Commerce in Organizations 4, 1 (2006), 1.Google ScholarCross Ref
- George R. Milne and Mary J. Culnan. 2004. Strategies for reducing online privacy risks: Why consumers read (or don’t read) online privacy notices. Journal of Interactive Marketing 18, 3 (2004), 15--29.Google ScholarCross Ref
- George R. Milne, Mary J. Culnan, and Henry Greene. 2006. A longitudinal assessment of online privacy notice readability. Journal of Public Policy 8 Marketing 25, 2 (2006), 238--249.Google ScholarCross Ref
- Nasdaq. 2015. Nasdaq. Retrieved September 3, 2015, from http://www.nasdaq.com.Google Scholar
- Robert W. Reeder, Patrick Gage Kelley, Aleecia M. McDonald, and Lorrie Faith Cranor. 2008. A user study of the expandable grid applied to P3P privacy policy visualization. In Proceedings of the 7th ACM Workshop on Privacy in the Electronic Society. ACM, 45--54. Google ScholarDigital Library
- Having Regard. 1980. Recommendation of the council concerning guidelines governing the protection of privacy and transborder flows of personal data.Google Scholar
- Disconnect Me. 2014. Disconnect Me privacy icons. Retrieved March 15, 2016, from https://disconnect.me/icons.Google Scholar
- Usable Privacy. 2016. Usable Privacy Project website. Retrieved September 28, 2016, from https://usableprivacy.org/.Google Scholar
- UT CID. 2015. PrivacyCheck. Retrieved May 16, 2016, from https://chrome.google.com/webstore/detail/privacycheck/poobeppenopkcbjejfjenbiepifcbclg.Google Scholar
- Norman Sadeh, Alessandro Acquisti, Travis D. Breaux, Lorrie Faith Cranor, Aleecia M. McDonald, Joel R. Reidenberg, Noah A. Smith, Fei Liu, N. Cameron Russell, Florian Schaub, and Shomir Wilson. 2013. The Usable Privacy Policy Project. Technical Report, CMU-ISR-13-119, Carnegie Mellon University.Google Scholar
- Nili Steinfeld. 2016. I agree to the terms and conditions: (How) do users read privacy policies online? An eye-tracking experiment. Computers in Human Behavior 55 (2016), 992--1000. Google ScholarDigital Library
- ToS;DR. 2012. Terms of Service; Didn’t Read. Retrieved March 4, 2015, from https://tosdr.org.Google Scholar
- TRUSTe. 2015. TRUSTe. Retrieved March 4, 2015, from http://www.truste.com.Google Scholar
- Shomir Wilson, Florian Schaub, Aswarth Dara, Sushain K. Cherivirala, Sebastian Zimmeck, Mads Schaarup Andersen, Pedro Giovanni Leon, Eduard Hovy, and Norman Sadeh. 2016a. Demystifying privacy policies using language technologies: Progress and challenges. In LREC Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS’16).Google Scholar
- Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. 2016b. The creation and analysis of a website privacy policy corpus. In Annual Meeting of the Association for Computational Linguistics. 1330--1340.Google ScholarCross Ref
- Shomir Wilson, Florian Schaub, Rohan Ramanath, Norman Sadeh, Fei Liu, Noah A. Smith, and Frederick Liu. 2016c. Crowdsourcing annotations for websites’ privacy policies: Can it really work? In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 133--143. Google ScholarDigital Library
- Sebastian Zimmeck. 2014. Privee Chrome extension. Retrieved July 13, 2015, from https://chrome.google.com/webstore/detail/privee/lmhnkfilbojonenmnagllnoiganihmnl.Google Scholar
- Sebastian Zimmeck and Steven M. Bellovin. 2014. Privee: An architecture for automatically analyzing web privacy policies. In 23rd USENIX Security Symposium (USENIX Security’14). USENIX Association, San Diego, CA, 1--16. Retrieved from https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/zimmeck. Google ScholarDigital Library
Index Terms
- PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining
Recommendations
PrivacyCheck v2: A Tool that Recaps Privacy Policies for You
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementDespite the efforts to regulate privacy policies to protect user privacy, these policies remain lengthy and hard to comprehend. Powered by machine learning, our publicly available browser extension, PrivacyCheck v2, automatically summarizes any privacy ...
PrivacyCheck v3: Empowering Users with Higher-Level Understanding of Privacy Policies
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningOnline privacy policies are lengthy and hard to read, yet are profoundly important as they communicate the practices of an organization pertaining to user data privacy. Privacy Enhancing Technologies, or PETs, seek to inform users by summarizing these ...
An analytical framework for online privacy research
An analytical framework is suggested for interdisciplinary online privacy research.Websites managers views and knowledge is a neglected topic in privacy research.Websites managers indicate that their own websites do not violate users privacy.The younger ...
Comments