ABSTRACT
The booming of social networks has given rise to a large volume of user-generated contents (UGCs), most of which are free and publicly available. A lot of users' personal aspects can be extracted from these UGCs to facilitate personalized applications as validated by many previous studies. Despite their value, UGCs can place users at high privacy risks, which thus far remains largely untapped. Privacy is defined as the individual's ability to control what information is disclosed, to whom, when and under what circumstances. As people and information both play significant roles, privacy has been elaborated as a boundary regulation process, where individuals regulate interaction with others by altering the openness degree of themselves to others. In this paper, we aim to reduce users' privacy risks on social networks by answering the question of Who Can See What. Towards this goal, we present a novel scheme, comprising of descriptive, predictive and prescriptive components. In particular, we first collect a set of posts and extract a group of privacy-oriented features to describe the posts. We then propose a novel taxonomy-guided multi-task learning model to predict which personal aspects are uncovered by the posts. Lastly, we construct standard guidelines by the user study with 400 users to regularize users' actions for preventing their privacy leakage. Extensive experiments on a real-world dataset well verified our scheme.
- Qingyao Ai, Yongfeng Zhang, Keping Bi, Xu Chen, and W. Bruce Croft . 2017. Learning a Hierarchical Embedding Model for Personalized Product Search Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 645--654. Google ScholarDigital Library
- Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil . 2008. Convex multi-task feature learning. Machine Learning Vol. 73, 3 (2008), 243--272. Google ScholarDigital Library
- Jing Bai, Ke Zhou, Guirong Xue, Hongyuan Zha, Gordon Sun, Belle Tseng, Zhaohui Zheng, and Yi Chang . 2009. Multi-task learning for learning to rank in web search The 24th ACM International Conference on Information and Knowledge Management. ACM, 1549--1552. Google ScholarDigital Library
- Amir Beck and Marc Teboulle . 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences Vol. 2, 1 (2009), 183--202. Google ScholarDigital Library
- Joanna Asia Biega, Rishiraj Saha Roy, and Gerhard Weikum . 2017. Privacy through Solidarity: A User-Utility-Preserving Framework to Counter Profiling. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 675--684. Google ScholarDigital Library
- Aylin Caliskan Islam, Jonathan Walsh, and Rachel Greenstadt . 2014. Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network. In Workshop on Privacy in the Electronic Society. 35--46. Google ScholarDigital Library
- Rich Caruana . 1997. Multitask learning. Machine learning Vol. 28, 1 (1997), 41--75. Google ScholarDigital Library
- Chih-Chung Chang and Chih-Jen Lin . 2011. LIBSVM: A library for support vector machines. TIST Vol. 2, 3 (2011), 27. Google ScholarDigital Library
- Zhiyong Cheng, Jialie Shen, and Steven C. H. Hoi . 2016. On Effective Personalized Music Retrieval by Exploring Online User Behaviors Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 125--134. Google ScholarDigital Library
- Zhiyong Cheng, Jialie Shen, Lei Zhu, Mohan S. Kankanhalli, and Liqiang Nie . 2017. Exploiting Music Play Sequence for Music Recommendation Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI. 3654--3660. Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik . 1995. Support-vector networks. Machine learning Vol. 20, 3 (1995), 273--297. Google ScholarDigital Library
- Munmun De Choudhury, Scott Counts, and Eric Horvitz . 2013. Major life changes and behavioral markers in social media: case of childbirth Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1431--1442. Google ScholarDigital Library
- Valerian J Derlega and Alan L Chaikin . 1977. Privacy and self-disclosure in social relationships. Journal of Social Issues Vol. 33, 3 (1977), 102--115.Google ScholarCross Ref
- Jianping Fan, Yuli Gao, and Hangzai Luo . 2007 a. Hierarchical classification for automatic image annotation The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 111--118. Google ScholarDigital Library
- Jianping Fan, Yuli Gao, and Hangzai Luo . 2007 b. Hierarchical classification for automatic image annotation Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 111--118. Google ScholarDigital Library
- Hongliang Fei, Ruoyi Jiang, Yuhao Yang, Bo Luo, and Jun Huan . 2011. Content based social behavior prediction: a multi-task learning approach The ACM International Conference on Information and Knowledge Management. ACM, 995--1000. Google ScholarDigital Library
- Fuli Feng, Liqiang Nie, Xiang Wang, Richang Hong, and Tat-Seng Chua . 2017. Computational social indicators: a case study of chinese university ranking The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 455--464. Google ScholarDigital Library
- Joseph L Fleiss, Jacob Cohen, and B. S Everitt . 1969. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin Vol. 72, 5 (1969), 323--327.Google ScholarCross Ref
- Yoav Freund, Robert E Schapire, et almbox. . 1996. Experiments with a new boosting algorithm. In International Conference on Machine Learning, Vol. Vol. 96. ACM, 148--156. Google ScholarDigital Library
- Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth JF Jones . 2015. Word Embedding based Generalized Language Model for Information Retrieval The International ACM SIGIR Conference on Research and Development in Information Retrieval. 795--798. Google ScholarDigital Library
- Shuguang Han, Daqing He, and Zhen Yue . 2014. Benchmarking the Privacy-Preserving People Search. In The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.Google Scholar
- Xiangnan He and Tat-Seng Chua . 2017. Neural Factorization Machines for Sparse Predictive Analytics Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 355--364. Google ScholarDigital Library
- Roger A Horn and Charles R Johnson . 1991. Topics in matrix analysis. Cambridge University Presss, Cambridge Vol. 37 (1991), 39. Google ScholarCross Ref
- Lee Humphreys, Phillipa Gill, and Balachander Krishnamurthy . 2010. How much is too much? Privacy issues on Twitter. In Conference of International Communication Association, Singapore.Google Scholar
- Lee Humphreys, Phillipa Gill, and Balachander Krishnamurthy . 2014. Twitter: a content analysis of personal information. Information, Communication & Society Vol. 17, 7 (2014), 843--857.Google ScholarCross Ref
- Melinda L Korzaan and Katherine T Boswell . 2008. The influence of personality traits and information privacy concerns on behavioral intentions. Journal of Computer Information Systems Vol. 48, 4 (2008), 15--24.Google Scholar
- Abhishek Kumar and Hal Daumé III . 2012. Learning Task Grouping and Overlap in Multi-task Learning International Conference on Machine Learning. 1383--1390. Google ScholarDigital Library
- J Richard Landis and Gary G Koch . 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.Google Scholar
- Kun Liu and Evimaria Terzi . 2010. A framework for computing the privacy scores of users in online social networks. ACM Transactions on Knowledge Discovery from Data Vol. 5, 1 (2010), 6. Google ScholarDigital Library
- Huina Mao, Xin Shuai, and Apu Kapadia . 2011. Loose tweets: an analysis of privacy leaks on twitter Workshop on Privacy in the Electronic Society. ACM, 1--12. Google ScholarDigital Library
- Frank McSherry and Ilya Mironov . 2009. Differentially private recommender systems: building privacy into the net The International ACN SIGKDD Conferences on Knowledge Discovery and Data Mining. 627--636. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119. Google ScholarDigital Library
- Tom M Mitchell . 1997. Machine learning. Burr Ridge, IL: McGraw Hill (1997). Google ScholarDigital Library
- Sandra Petronio . 2012. Boundaries of privacy: Dialectics of disclosure. Suny Press.Google Scholar
- Lee Rainie, Sara Kiesler, Ruogu Kang, Mary Madden, Maeve Duggan, Stephanie Brown, and Laura Dabbish . 2013. Anonymity, privacy, and security online. Pew Research Center (2013).Google Scholar
- Manya Sleeper, Justin Cranshaw, Patrick Gage Kelley, Blase Ur, Alessandro Acquisti, Lorrie Faith Cranor, and Norman Sadeh . 2013. I read my Twitter the next morning and was astonished: A conversational perspective on Twitter regrets. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3277--3286. Google ScholarDigital Library
- Xuemeng Song, Zhaoyan Ming, Liqiang Nie, Yi-Liang Zhao, and Tat-Seng Chua . 2016. Volunteerism Tendency Prediction via Harvesting Multiple Social Networks. ACM Transactions on Information System Vol. 34, 2 (2016), 10:1--10:27. Google ScholarDigital Library
- Xuemeng Song, Liqiang Nie, Luming Zhang, Mohammad Akbari, and Tat-Seng Chua . 2015 a. Multiple social network learning and its application in volunteerism tendency prediction. In The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 213--222. Google ScholarDigital Library
- Xuemeng Song, Liqiang Nie, Luming Zhang, Maofu Liu, and Tat-Seng Chua . 2015 b. Interest inference via structure-constrained multi-source multi-task learning International Joint Conference on Artificial Intelligence. AAAI Press, 2371--2377. Google ScholarDigital Library
- Yi Song, Daniel Dahlmeier, and Stephane Bressan . 2014. Not So Unique in the Crowd: a Simple and Effective Algorithm for Anonymizing Location Data The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 19.Google Scholar
- Damiano Spina, Julio Gonzalo, and Enrique Amigó . 2014. Learning similarity functions for topic detection in online reputation monitoring The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 527--536. Google ScholarDigital Library
- Robert Tibshirani . 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267--288.Google Scholar
- Asimina Vasalou, Alastair J Gill, Fadhila Mazanderani, Chrysanthi Papoutsi, and Adam Joinson . 2011. Privacy dictionary: A new resource for the automated content analysis of privacy. JASIST Vol. 62, 11 (2011), 2095--2105. Google ScholarDigital Library
- Yulu Wang, Garrick Sherman, Jimmy Lin, and Miles Efron . 2015. Assessor Differences and User Preferences in Tweet Timeline Generation International ACM SIGIR Conference on Research and Development in Information Retrieval. 615--624. Google ScholarDigital Library
- Simon S Woo and Harsha Manjunatha . 2015. Empirical Data Analysis on User Privacy and Sentiment in Personal Blogs The International ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
- Sicong Zhang, Hui Yang, and Lisa Singh . 2014. Increased Information Leakage from Text. In The International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 41--42.Google Scholar
Index Terms
- A Personal Privacy Preserving Framework: I Let You Know Who Can See What
Recommendations
Privacy-preserving topic model for tagging recommender systems
Tagging recommender systems provide users the freedom to explore tags and obtain recommendations. The releasing and sharing of these tagging datasets will accelerate both commercial and research work on recommender systems. However, releasing the ...
Privacy preserving of trust management credentials based on trusted computing
ISPEC'10: Proceedings of the 6th international conference on Information Security Practice and ExperiencePrivacy disclosure of forward direction credentials and backward direction credentials is an important security defect in existing trust management systems. In this paper, a novel distributed privacy preserving scheme for trust management credentials is ...
A Review on Privacy-Preserving Data Mining
CIT '14: Proceedings of the 2014 IEEE International Conference on Computer and Information TechnologyData mining has been widely studied and applied into many fields such as Internet of Things (IoT) and business development. However, data mining techniques also occur serious challenges due to increased sensitive information disclosure and privacy ...
Comments