research-article

Classifying latent user attributes in twitter

Authors:
Delip Rao

Johns Hopkins University, Baltimore, USA

Johns Hopkins University, Baltimore, USA
View Profile

,
David Yarowsky

Johns Hopkins University, Baltimore, USA

Johns Hopkins University, Baltimore, USA
View Profile

,
Abhishek Shreevats

Johns Hopkins University, Baltimore, USA

Johns Hopkins University, Baltimore, USA
View Profile

,
Manaswi Gupta

Johns Hopkins University, Baltimore, USA

Johns Hopkins University, Baltimore, USA
View Profile

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contentsOctober 2010Pages 37–44https://doi.org/10.1145/1871985.1871993

Published:30 October 2010Publication History

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

Pages 37–44

ABSTRACT

Social media outlets such as Twitter have become an important forum for peer interaction. Thus the ability to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language or similar highly informal content has important applications in advertising, personalization, and recommendation. This paper includes a novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes. It also includes extensive analysis of features and approaches that are effective and not effective in classifying user attributes in Twitter-style informal written genres as distinct from the other primarily spoken genres previously studied in the user-property classification literature. Our models, singly and in ensemble, significantly outperform baseline models in all cases. A detailed analysis of model components and features provides an often entertaining insight into distinctive language-usage variation across gender, age, regional origin and political orientation in modern informal communication.

References

T. Bocklet, A. Maier, and E. Nöth. Age determination of children in preschool and primary school age with gmm-based supervectors and support vector machines/regression. In TSD '08: Proceedings of the 11th international conference on Text, Speech and Dialogue, pages 253--260, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
C. Boulis and M. Ostendorf. A quantitative analysis of lexical differences between genders in telephone conversations. In ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 435--442, Morristown, NJ, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
J. Burger and J. Henderson. An exploration of observable features related to blogger age. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, 2006.Google Scholar
J. Coates. Language and Gender: A Reader. Blackwell Publishers, 1998.Google Scholar
P. Eckert and S. McConnell-Ginet. Language and Gender. Cambridge University Press, 2003.Google ScholarCross Ref
J. Fischer. Social influences on the choice of a linguistic variant. In Proceedings of Word, 1958.Google ScholarCross Ref
N. Garera and D. Yarowsky. Modeling latent biographic attributes in conversational genres. In Proceedings of the Joint Conference of Association of Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), pages 710--718, 2009. Google ScholarDigital Library
S. Herring and J. Paolillo. Gender and genre variation in weblogs. In Journal of Sociolinguistics, 2006.Google Scholar
T. Joachims. Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Google ScholarDigital Library
W. Labov. The Social Stratification of English in New York City. Center for Applied Linguistics, Washington DC, 1966.Google Scholar
R. K. Macaulay. Talk that counts: Age, Gender, and Social Class Differences in Discourse. Oxford University Press, 2005.Google Scholar
S. Nowson and J. Oberlander. The identity of bloggers: Openness and gender in personal weblogs. In Computational Approaches to Analyzing Weblogs: Papers from the 2006 AAAI Spring Symposium, 2006.Google Scholar
S. Singh. A pilot study on gender differences in conversational speech on lexical richness measures. In Literary and Linguistic Computing, 2001.Google ScholarCross Ref
M. Thomas, B. Pang, and L. Lee. Get out the vote: determining support or opposition from congressional floor-debate transcripts. In EMNLP '06, 2006. Google ScholarDigital Library

Index Terms

Classifying latent user attributes in twitter
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Measuring influence on Twitter
i-KNOW '11: Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies

There are currently over 175 million Twitter accounts worldwide, making Twitter one of the most popular and most observed Social Media platform. But Twitter is not so much a social network where the exchange of personal information is facilitated -- in ...
Read More
Learning Multimodal Latent Attributes

The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via ...
Read More
Celebrity's self-disclosure on Twitter and parasocial relationships

This study investigated how celebrities' self-disclosure on personal social media accounts, particularly Twitter, affects fans' perceptions. An online survey was utilized among a sample of 429 celebrity followers on Twitter. Results demonstrated that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents
October 2010
136 pages
ISBN:9781450303866
DOI:10.1145/1871985
General Chairs:
Jose Carlos Cortizo
BrainSins, Spain
,
Francisco M. Carrero
BrainSins, Spain
,
Ivan Cantador
Autonomous University of Madrid, Spain
,
Jose Antonio Troyano
University of Seville, Spain
,
Paolo Rosso
Technical University of Valencia, Spain
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attribute learning
latent attribute classification
social media
Qualifiers
- research-article
Conference

Acceptance Rates
SMUC '10 Paper Acceptance Rate15of25submissions,60%Overall Acceptance Rate15of25submissions,60%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 353
  Total Citations
  View Citations
- 3,123
  Total Downloads
- Downloads (Last 12 months)106
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Classifying latent user attributes in twitter

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

ABSTRACT

References

Cited By

Index Terms

Recommendations

Measuring influence on Twitter

Learning Multimodal Latent Attributes

Celebrity's self-disclosure on Twitter and parasocial relationships

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Classifying latent user attributes in twitter

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

ABSTRACT

References

Cited By

Index Terms

Recommendations

Measuring influence on Twitter

Learning Multimodal Latent Attributes

Celebrity's self-disclosure on Twitter and parasocial relationships

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media