PRISM: Profession Identification in Social Media

Authors:
Cunchao Tu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-4538-9860
View Profile

,
Zhiyuan Liu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-7709-2543
View Profile

,
Huanbo Luan

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Maosong Sun

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 8 Issue 6Article No.: 81pp 1–16https://doi.org/10.1145/3070665

Published:18 August 2017Publication History

ACM Transactions on Intelligent Systems and Technology

Editorial Notes

"PRISM: Profession Identification in Social Media" by C. Tu, Z. Liu, H. Luan, and M. Sun, ACM TIST, Vol 8, Issue 6, Nov 2017, https://doi.org/10.1145/3070665, is an extension of "PRISM: Profession Identification in Social Media with Personal Information and Community Structure" by C. Tu, Z. Liu, and M. Sun, Social Media Processing Proceedings, SMP 2015, Communications in Computer and Information Science, 4th National Conference, Nov 2015, Springer, DOI: 10.1007/978-981-10-0080-5_2.

Abstract

Profession is an important social attribute of people. It plays a crucial role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this article, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges that make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework called Profession Identification in Social Media. It takes advantage of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidence of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate the significant effectiveness of our method compared with other baseline methods. By applying prediction on large-scale users, we also analyze characteristics of microblog users, finding that there are significant diversities among users of different professions in demographics, social network structures, and linguistic styles.

References

Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis, Elias Athanasopoulos, Sotiris Ioannidis, Evangelos P. Markatos, and Thomas Karagiannis. 2011. We.b: The web of short URLs. In Proc. WWW. 715--724.Google ScholarDigital Library
John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on twitter. In Proc. EMNLP. 1301--1309.Google Scholar
Janet Saltzman Chafetz. 1988. The gender division of labor and the reproduction of female disadvantage toward an integrated theory. J. Family Iss. 9, 1 (1988), 108--131.Google ScholarCross Ref
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27.Google ScholarDigital Library
Gaurish Chaudhari, Vashist Avadhanula, and Sunita Sarawagi. 2014. A few good predictions: Selective node labeling in a social network. In Proc. WSDM. 353--362. Google ScholarDigital Library
Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proc. WWW. 699--708.Google ScholarDigital Library
Peter Sheridan Dodds, Kameron Decker Harris, Isabel M. Kloumann, Catherine A. Bliss, and Christopher M. Danforth. 2011. Temporal patterns of happiness and information in a global social network: Hedonometrics and twitter. PLoS ONE 6, 12 (2011), e26752.Google ScholarCross Ref
A. Evgeniou and Massimiliano Pontil. 2007. Multi-task feature learning. In Proc. NIPS, Vol. 19. 41.Google Scholar
Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In Proc. KDD. 109--117. Google ScholarDigital Library
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 (Aug. 2008), 1871--1874.Google Scholar
Wei Feng and Jianyong Wang. 2012. Incorporating heterogeneous information for personalized tag recommendation in social tagging systems. In Proc. KDD. 1276--1284. Google ScholarDigital Library
Clayton Fink, Jonathon Kopecky, and Maksym Morawski. 2012. Inferring gender from the content of tweets: A region specific example. In Proc. ICWSM.Google Scholar
George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (March 2003), 1289--1305.Google ScholarDigital Library
Jennifer Golbeck, Cristina Robles, and Karen Turner. 2011. Predicting personality with social media. In Proc. CHI. 253--262. Google ScholarDigital Library
Sumit Goswami, Sudeshna Sarkar, and Mayur Rustagi. 2009. Stylometric analysis of bloggers’ age and gender. In Proc. ICWSM.Google ScholarCross Ref
John L. Holland. 1997. Making Vocational Choices: A Theory of Vocational Personalities and Work Environments. Psychological Assessment Resources.Google Scholar
Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 2014. Learning latent representations of nodes for classifying in heterogeneous social networks. In Proc. WSDM. 373--382. Google ScholarDigital Library
Xiangnan Kong, Bokai Cao, and Philip S. Yu. 2013. Multi-label classification by mining label and instance correlations from heterogeneous information networks. In Proc. KDD. 614--622. Google ScholarDigital Library
Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. U.S.A. 110, 15 (2013), 5802--5805. Google ScholarCross Ref
David Lazer, Alex Sandy Pentland, Lada Adamic, Sinan Aral, Albert Laszlo Barabasi, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, and others. 2009. Life in the network: The coming age of computational social science. Science 323, 5915 (2009), 721.Google Scholar
Kevin Lewis, Marco Gonzalez, and Jason Kaufman. 2012. Social selection and peer influence in an online social network. Proc. Natl. Acad. Sci. U.S.A. 109, 1 (2012), 68--72. Google ScholarCross Ref
Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: Unified and discriminative influence model for inferring home locations. In Proc. KDD. 1023--1031. Google ScholarDigital Library
Zhiyuan Liu, Cunchao Tu, and Maosong Sun. 2012. Tag dispatch model with social network regularization for microblog user tag suggestion. In Proc. COLING.Google Scholar
François Mairesse, Marilyn A. Walker, Matthias R. Mehl, and Roger K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 30 (2007), 457--500.Google ScholarCross Ref
Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. (2001), 415--444.Google Scholar
Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. 2011. Understanding the demographics of twitter users. In Proc. ICWSM.Google Scholar
Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In Proc. WSDM. 251--260. Google ScholarDigital Library
Mark E. J. Newman. 2006. Modularity and community structure in networks. Proc. Natl. Acad. Sci. U.S.A. 103, 23 (2006), 8577--8582. Google ScholarCross Ref
Kate G. Niederhoffer and James W. Pennebaker. 2002. Linguistic style matching in social interaction. J. Lang. Soc. Psychol. 21, 4 (2002), 337--360. Google ScholarCross Ref
Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent user attributes in twitter. In Proceedings of Workshop on Search and Mining User-Generated Contents. 37--44. Google ScholarDigital Library
Robert A. Rothman. 1987. Working: Sociological Perspectives. Prentice-Hall Englewood Cliffs, NJ.Google Scholar
Mrinmaya Sachan, Avinava Dubey, Shashank Srivastava, Eric P. Xing, and Eduard Hovy. 2014. Spatial compactness meets topical consistency: Jointly modeling links and content for community detection. In Proc. WSDM. 503--512. Google ScholarDigital Library
H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E. P. Seligman, and others. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8, 9 (2013), e73791.Google ScholarCross Ref
Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2014. Inferring correspondences from multiple sources for microblog user tags. In Chinese National Conference on Social Media Processing. Springer, 1--12. Google ScholarCross Ref
Cunchao Tu, Hao Wang, Xiangkai Zeng, Zhiyuan Liu, and Maosong Sun. 2016a. Community-enhanced network representation learning for network analysis. arXiv preprint arXiv:1611.06645 (2016).Google Scholar
Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun. 2016b. Max-margin deepwalk: Discriminative learning of network representation. In Proc. IJCAI. 3889--3895.Google Scholar
Rudi Volti. 2011. An Introduction to the Sociology of Work and Occupations. Pine Forge Press.Google Scholar
Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network representation learning with rich text information. In Proc. IJCAI. 2111--2117.Google ScholarDigital Library
Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng, and Hongyuan Zha. 2011. Like like alike: Joint friendship and interest propagation in social networks. In Proc. WWW. 537--546.Google ScholarDigital Library
Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proc. ICML, Vol. 97. 412--420.Google Scholar
Xiaojin Zhu and Andrew B. Goldberg. 2009. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3, 1 (2009), 1--130. Google ScholarCross Ref

Index Terms

PRISM: Profession Identification in Social Media
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
2. Social and professional topics
  1. User characteristics

Index terms have been assigned to the content through auto-classification.

Recommendations

Discovering Overlapping Groups in Social Media
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data Mining

The increasing popularity of social media is shortening the distance between people. Social activities, e.g., tagging in Flickr, book marking in Delicious, twittering in Twitter, etc. are reshaping people’s social life and redefining their social roles. ...
Read More
Uses and gratifications of social networking sites for bridging and bonding social capital

Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Read More
Gratifications of using Facebook, Twitter, Instagram, or Snapchat to follow brands

Snapchat is used for passing time, sharing problems, and social knowledge.Instagram is used for showing affection, following fashion, and sociability.Twitter users had highest brand community identification and membership intention.Instagram users had ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 8, Issue 6
Survey Paper, Regular Papers and Special Issue: Social Media Processing
November 2017
265 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3127339
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2017 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 August 2017
- Accepted: 1 March 2017
- Revised: 1 December 2016
- Received: 1 November 2015
Published in tist Volume 8, Issue 6

Check for updates
Author Tags
Profession identification
community detection
heterogeneous information
social media
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 709
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PRISM: Profession Identification in Social Media

ACM Transactions on Intelligent Systems and Technology

Editorial Notes

Abstract

References

Cited By

Index Terms

Recommendations

Discovering Overlapping Groups in Social Media

Uses and gratifications of social networking sites for bridging and bonding social capital

Gratifications of using Facebook, Twitter, Instagram, or Snapchat to follow brands