ABSTRACT
Social media outlets such as Twitter have become an important forum for peer interaction. Thus the ability to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language or similar highly informal content has important applications in advertising, personalization, and recommendation. This paper includes a novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes. It also includes extensive analysis of features and approaches that are effective and not effective in classifying user attributes in Twitter-style informal written genres as distinct from the other primarily spoken genres previously studied in the user-property classification literature. Our models, singly and in ensemble, significantly outperform baseline models in all cases. A detailed analysis of model components and features provides an often entertaining insight into distinctive language-usage variation across gender, age, regional origin and political orientation in modern informal communication.
- T. Bocklet, A. Maier, and E. Nöth. Age determination of children in preschool and primary school age with gmm-based supervectors and support vector machines/regression. In TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue, pages 253--260, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- C. Boulis and M. Ostendorf. A quantitative analysis of lexical differences between genders in telephone conversations. In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 435--442, Morristown, NJ, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
- J. Burger and J. Henderson. An exploration of observable features related to blogger age. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, 2006.Google Scholar
- J. Coates. Language and Gender: A Reader. Blackwell Publishers, 1998.Google Scholar
- P. Eckert and S. McConnell-Ginet. Language and Gender. Cambridge University Press, 2003.Google ScholarCross Ref
- J. Fischer. Social influences on the choice of a linguistic variant. In Proceedings of Word, 1958.Google ScholarCross Ref
- N. Garera and D. Yarowsky. Modeling latent biographic attributes in conversational genres. In Proceedings of the Joint Conference of Association of Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), pages 710--718, 2009. Google ScholarDigital Library
- S. Herring and J. Paolillo. Gender and genre variation in weblogs. In Journal of Sociolinguistics, 2006.Google Scholar
- T. Joachims. Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Google ScholarDigital Library
- W. Labov. The Social Stratification of English in New York City. Center for Applied Linguistics, Washington DC, 1966.Google Scholar
- R. K. Macaulay. Talk that counts: Age, Gender, and Social Class Differences in Discourse. Oxford University Press, 2005.Google Scholar
- S. Nowson and J. Oberlander. The identity of bloggers: Openness and gender in personal weblogs. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, 2006.Google Scholar
- S. Singh. A pilot study on gender differences in conversational speech on lexical richness measures. In Literary and Linguistic Computing, 2001.Google ScholarCross Ref
- M. Thomas, B. Pang, and L. Lee. Get out the vote: determining support or opposition from congressional floor-debate transcripts. In EMNLP '06, 2006. Google ScholarDigital Library
Index Terms
- Classifying latent user attributes in twitter
Recommendations
Measuring influence on Twitter
i-KNOW '11: Proceedings of the 11th International Conference on Knowledge Management and Knowledge TechnologiesThere are currently over 175 million Twitter accounts worldwide, making Twitter one of the most popular and most observed Social Media platform. But Twitter is not so much a social network where the exchange of personal information is facilitated -- in ...
Learning Multimodal Latent Attributes
The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via ...
Celebrity's self-disclosure on Twitter and parasocial relationships
This study investigated how celebrities' self-disclosure on personal social media accounts, particularly Twitter, affects fans' perceptions. An online survey was utilized among a sample of 429 celebrity followers on Twitter. Results demonstrated that ...
Comments