skip to main content
research-article
Open Access

PRISM: Profession Identification in Social Media

Published:18 August 2017Publication History
Skip Editorial Notes Section

Editorial Notes

"PRISM: Profession Identification in Social Media" by C. Tu, Z. Liu, H. Luan, and M. Sun, ACM TIST, Vol 8, Issue 6, Nov 2017, https://doi.org/10.1145/3070665, is an extension of "PRISM: Profession Identification in Social Media with Personal Information and Community Structure" by C. Tu, Z. Liu, and M. Sun, Social Media Processing Proceedings, SMP 2015, Communications in Computer and Information Science, 4th National Conference, Nov 2015, Springer, DOI: 10.1007/978-981-10-0080-5_2.

Skip Abstract Section

Abstract

Profession is an important social attribute of people. It plays a crucial role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this article, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges that make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework called Profession Identification in Social Media. It takes advantage of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidence of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate the significant effectiveness of our method compared with other baseline methods. By applying prediction on large-scale users, we also analyze characteristics of microblog users, finding that there are significant diversities among users of different professions in demographics, social network structures, and linguistic styles.

References

  1. Demetris Antoniades, Iasonas Polakis, Georgios Kontaxis, Elias Athanasopoulos, Sotiris Ioannidis, Evangelos P. Markatos, and Thomas Karagiannis. 2011. We.b: The web of short URLs. In Proc. WWW. 715--724.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on twitter. In Proc. EMNLP. 1301--1309.Google ScholarGoogle Scholar
  3. Janet Saltzman Chafetz. 1988. The gender division of labor and the reproduction of female disadvantage toward an integrated theory. J. Family Iss. 9, 1 (1988), 108--131.Google ScholarGoogle ScholarCross RefCross Ref
  4. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gaurish Chaudhari, Vashist Avadhanula, and Sunita Sarawagi. 2014. A few good predictions: Selective node labeling in a social network. In Proc. WSDM. 353--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proc. WWW. 699--708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Peter Sheridan Dodds, Kameron Decker Harris, Isabel M. Kloumann, Catherine A. Bliss, and Christopher M. Danforth. 2011. Temporal patterns of happiness and information in a global social network: Hedonometrics and twitter. PLoS ONE 6, 12 (2011), e26752.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Evgeniou and Massimiliano Pontil. 2007. Multi-task feature learning. In Proc. NIPS, Vol. 19. 41.Google ScholarGoogle Scholar
  9. Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In Proc. KDD. 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 (Aug. 2008), 1871--1874.Google ScholarGoogle Scholar
  11. Wei Feng and Jianyong Wang. 2012. Incorporating heterogeneous information for personalized tag recommendation in social tagging systems. In Proc. KDD. 1276--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clayton Fink, Jonathon Kopecky, and Maksym Morawski. 2012. Inferring gender from the content of tweets: A region specific example. In Proc. ICWSM.Google ScholarGoogle Scholar
  13. George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (March 2003), 1289--1305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jennifer Golbeck, Cristina Robles, and Karen Turner. 2011. Predicting personality with social media. In Proc. CHI. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sumit Goswami, Sudeshna Sarkar, and Mayur Rustagi. 2009. Stylometric analysis of bloggers’ age and gender. In Proc. ICWSM.Google ScholarGoogle ScholarCross RefCross Ref
  16. John L. Holland. 1997. Making Vocational Choices: A Theory of Vocational Personalities and Work Environments. Psychological Assessment Resources.Google ScholarGoogle Scholar
  17. Yann Jacob, Ludovic Denoyer, and Patrick Gallinari. 2014. Learning latent representations of nodes for classifying in heterogeneous social networks. In Proc. WSDM. 373--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xiangnan Kong, Bokai Cao, and Philip S. Yu. 2013. Multi-label classification by mining label and instance correlations from heterogeneous information networks. In Proc. KDD. 614--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. U.S.A. 110, 15 (2013), 5802--5805. Google ScholarGoogle ScholarCross RefCross Ref
  20. David Lazer, Alex Sandy Pentland, Lada Adamic, Sinan Aral, Albert Laszlo Barabasi, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, and others. 2009. Life in the network: The coming age of computational social science. Science 323, 5915 (2009), 721.Google ScholarGoogle Scholar
  21. Kevin Lewis, Marco Gonzalez, and Jason Kaufman. 2012. Social selection and peer influence in an online social network. Proc. Natl. Acad. Sci. U.S.A. 109, 1 (2012), 68--72. Google ScholarGoogle ScholarCross RefCross Ref
  22. Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: Unified and discriminative influence model for inferring home locations. In Proc. KDD. 1023--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhiyuan Liu, Cunchao Tu, and Maosong Sun. 2012. Tag dispatch model with social network regularization for microblog user tag suggestion. In Proc. COLING.Google ScholarGoogle Scholar
  24. François Mairesse, Marilyn A. Walker, Matthias R. Mehl, and Roger K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Intell. Res. 30 (2007), 457--500.Google ScholarGoogle ScholarCross RefCross Ref
  25. Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. (2001), 415--444.Google ScholarGoogle Scholar
  26. Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. 2011. Understanding the demographics of twitter users. In Proc. ICWSM.Google ScholarGoogle Scholar
  27. Alan Mislove, Bimal Viswanath, Krishna P. Gummadi, and Peter Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In Proc. WSDM. 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mark E. J. Newman. 2006. Modularity and community structure in networks. Proc. Natl. Acad. Sci. U.S.A. 103, 23 (2006), 8577--8582. Google ScholarGoogle ScholarCross RefCross Ref
  29. Kate G. Niederhoffer and James W. Pennebaker. 2002. Linguistic style matching in social interaction. J. Lang. Soc. Psychol. 21, 4 (2002), 337--360. Google ScholarGoogle ScholarCross RefCross Ref
  30. Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent user attributes in twitter. In Proceedings of Workshop on Search and Mining User-Generated Contents. 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Robert A. Rothman. 1987. Working: Sociological Perspectives. Prentice-Hall Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  32. Mrinmaya Sachan, Avinava Dubey, Shashank Srivastava, Eric P. Xing, and Eduard Hovy. 2014. Spatial compactness meets topical consistency: Jointly modeling links and content for community detection. In Proc. WSDM. 503--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E. P. Seligman, and others. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE 8, 9 (2013), e73791.Google ScholarGoogle ScholarCross RefCross Ref
  34. Cunchao Tu, Zhiyuan Liu, and Maosong Sun. 2014. Inferring correspondences from multiple sources for microblog user tags. In Chinese National Conference on Social Media Processing. Springer, 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  35. Cunchao Tu, Hao Wang, Xiangkai Zeng, Zhiyuan Liu, and Maosong Sun. 2016a. Community-enhanced network representation learning for network analysis. arXiv preprint arXiv:1611.06645 (2016).Google ScholarGoogle Scholar
  36. Cunchao Tu, Weicheng Zhang, Zhiyuan Liu, and Maosong Sun. 2016b. Max-margin deepwalk: Discriminative learning of network representation. In Proc. IJCAI. 3889--3895.Google ScholarGoogle Scholar
  37. Rudi Volti. 2011. An Introduction to the Sociology of Work and Occupations. Pine Forge Press.Google ScholarGoogle Scholar
  38. Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y. Chang. 2015. Network representation learning with rich text information. In Proc. IJCAI. 2111--2117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shuang-Hong Yang, Bo Long, Alex Smola, Narayanan Sadagopan, Zhaohui Zheng, and Hongyuan Zha. 2011. Like like alike: Joint friendship and interest propagation in social networks. In Proc. WWW. 537--546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proc. ICML, Vol. 97. 412--420.Google ScholarGoogle Scholar
  41. Xiaojin Zhu and Andrew B. Goldberg. 2009. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3, 1 (2009), 1--130. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PRISM: Profession Identification in Social Media
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 6
          Survey Paper, Regular Papers and Special Issue: Social Media Processing
          November 2017
          265 pages
          ISSN:2157-6904
          EISSN:2157-6912
          DOI:10.1145/3127339
          • Editor:
          • Yu Zheng
          Issue’s Table of Contents

          Copyright © 2017 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 August 2017
          • Accepted: 1 March 2017
          • Revised: 1 December 2016
          • Received: 1 November 2015
          Published in tist Volume 8, Issue 6

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader