research-article

Short message communications: users, topics, and in-language processing

Authors:
Robert Munro

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Christopher D. Manning

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

ACM DEV '12: Proceedings of the 2nd ACM Symposium on Computing for DevelopmentMarch 2012Article No.: 4Pages 1–10https://doi.org/10.1145/2160601.2160607

Published:11 March 2012Publication History

ACM DEV '12: Proceedings of the 2nd ACM Symposium on Computing for Development

Pages 1–10

ABSTRACT

This paper investigates three dimensions of cross-domain analysis for humanitarian information processing: citizen reporting vs organizational reporting; Twitter vs SMS; and English vs non-English communications. Short messages sent during the response to the recent earthquake in Haiti and floods in Pakistan are analyzed. It is clear that SMS and Twitter were used very differently at the time, by different groups of people. SMS was primarily used by individuals on the ground while Twitter was primarily used by the international community. Turning to semi-automated strategies that employ natural language processing, it is found that English-optimal strategies do not carry over to Urdu or Kreyol, especially with regards to subword variation. Looking at machine-learning models that attempt to combine both Twitter and SMS, it is found that the cross-domain prediction accuracy is very poor, but some loss in accuracy can be overcome by learning prior distributions over the sources. It is concluded that there is only limited utility in treating SMS and Twitter as equivalent information sources -- perhaps much less than the relatively large number of recent Twitter-focused papers would indicate.

References

P. P. Alexander Pak. Twitter as a corpus for sentiment analysis and opinion mining. In Proceeding of the 2010 International Conference on Language Resources and Evaluation (LREC 2010), 2010.Google Scholar
R. Beaufort, S. Roekhaut, L. Cougnon, and C. Fairon. A hybrid rule/model-based finite-state framework for normalizing sms messages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 770--779, 2010. Google ScholarDigital Library
C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen. Classifying text messages for the haiti earthquake. In Proceedings of the 8th International Conference on Information Systems for Crisis Response and Management (ISCRAM2011), Lisbon, Portugal, 2011.Google Scholar
M. Choudhury, R. Saraf, V. Jain, A. Mukherjee, S. Sarkar, and A. Basu. Investigation and modeling of the structure of texting language. International Journal on Document Analysis and Recognition, 10(3):157--174, 2007. Google ScholarDigital Library
P. Cook and S. Stevenson. An unsupervised model for text message normalization. In Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pages 71--78. Association for Computational Linguistics, 2009. Google ScholarDigital Library
G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz. Feature engineering for mobile (SMS) spam filtering. In The 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007. Google ScholarDigital Library
S. Garrett. Big goals, big game, big records. In Twitter Blog (http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html), 2010.Google Scholar
S. Goldwater, T. L. Griffiths, and M. Johnson. A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1):21--54, 2009.Google ScholarCross Ref
M. Healy, S. J. Delany, and A. Zamolotskikh. An assessment of case-based reasoning for Short Text Message Classification. In The 16th Irish Conference on Artificial Intelligence & Cognitive Science, 2005.Google Scholar
J. M. G. Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García. Content based SMS spam filtering. In ACM symposium on Document engineering, 2006. Google ScholarDigital Library
N. Hodge. Texts, Tweets Saving Haitians from the Rubble. Wired Magazine, 2010.Google Scholar
S. Isbrandt. Cell Phones in West Africa: improving literacy and agricultural market information systems in Niger. White paper: Projet Alphabétisation de Base par Cellulaire, 2009.Google Scholar
ITU. The world in 2010 - the rise of 3G. In International Telecommunication Union, 2011.Google Scholar
A. Jagun, R. Heeks, and J. Whalley. The impact of mobile telephony on developing country micro-enterprise: A Nigerian case study. Information Technologies and International Development, 4, 2008. Google ScholarDigital Library
C. Kobus, F. Yvon, and G. Damnati. Normalizing SMS: are two metaphors better than one? In The 22nd International Conference on Computational Linguistics, 2008. Google ScholarDigital Library
C. Leach-Lemens. Using mobile phones in HIV care and prevention. HIV and AIDS Treatment in Practice, 137, 2009.Google Scholar
W. Lewis. Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes. In 14th Annual Conference of the European Association for Machine Translation, 2010.Google Scholar
W. Lewis, R. Munro, and S. Vogel. Crisis MT: Developing A Cookbook for MT in Crisis Situations. In Annual Workshop on Machine Translation, EMNLP, Edinburgh, 2011. Google ScholarDigital Library
F. Liu, F. Weng, B. Wang, and Y. Liu. Insertion, deletion, or substitution? normalizing text messages without pre-categorization nor supervision. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011. Google ScholarDigital Library
R. Munro. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In AMTA Workshop on Collaborative Crowdsourcing for Translation, 2010.Google Scholar
R. Munro. Subword and spatiotemporal models for identifying actionable information in haitian kreyol. In Fifteenth Conference on Natural Language Learning (CoNLL), Portland, OR, 2011. Google ScholarDigital Library
R. Munro and C. D. Manning. Subword variation in text message classification. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), Los Angeles, CA, 2010. Google ScholarDigital Library
B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010.Google Scholar
G. Peevers, G. Douglas, and M. A. Jack. A usability comparison of three alternative message formats for an SMS banking service. International Journal of Human-Computer Studies, 66, 2008. Google ScholarDigital Library
D. Pennell and Y. Liu. Normalization of text messages for text-to-speech. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 4842--4845. IEEE, 2010.Google ScholarCross Ref
K. Peterson, M. Hohensee, and F. Xia. Email formality in the workplace: A case study on the enron corpus. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics ACL 2011, page 86, 2011. Google ScholarDigital Library
S. Petrović, M. Osborne, and V. Lavrenko. Streaming first story detection with application to twitter. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), 2010. Google ScholarDigital Library
Pingdom. Internet 2010 in numberse. In Royal Pingdom Blog (http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/), 2011.Google Scholar
B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, 2010. Google ScholarDigital Library
K. Starbird and L. Palen. Voluntweeters: Self-organizing by digital volunteers in times of crisis. In ACM CHI Conference on Human Factors in Computing Systems, Vancouver, CA, 2011. Google ScholarDigital Library
K. Starbird and J. Stamberger. Tweak the tweet: Leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting. In Proceedings of the seventh international ISCRAM Conference. ACM, 2010.Google Scholar
K. Starbird and J. Stamberger. Tweak the Tweet: Leveraging Microblogging Proliferation with a Prescriptive Syntax to Support Citizen Reporting. In Proceedings of the 7th International ISCRAM Conference, 2010.Google Scholar
S. Stenner, K. Johnson, and J. Denny. Paste: patient-centered sms text tagging in a medication management system. Journal of the American Medical Informatics Association, 2011.Google Scholar
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems, 17, 2005.Google Scholar
S. Vieweg, A. Hughes, K. Starbird, and L. Palen. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the 28th international conference on Human factors in computing systems, pages 1079--1088. ACM, 2010. Google ScholarDigital Library
Z. Xue, D. Yin, and B. D. Davison. Normalizing microtext. In Proceedings of the AAAI Workshop on Analyzing Microtext, 2011.Google ScholarDigital Library

Short message communications: users, topics, and in-language processing
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Crisis Communications: A Casebook Approach
Read More
Short paper: annotating microblog posts with sensor data for emergency reporting applications
SSN'11: Proceedings of the 4th International Conference on Semantic Sensor Networks - Volume 839

The explosion in user-generated content (on the Social Web) published from mobile devices has seen microblog platforms like Twitter grow exponentially. Twitter is a microblogging platform founded in 2006, which by October 2010 had roughly 175m users and ...
Read More
Using Twitter's Mentions for Efficient Emergency Message Propagation
ARES '13: Proceedings of the 2013 International Conference on Availability, Reliability and Security

Using social media such as Twitter for emergency message propagation in times of crisis is widely thought to be a good addition to other traditional emergency population warning systems such as televisions. At the same time, most studies on Twitter ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM DEV '12: Proceedings of the 2nd ACM Symposium on Computing for Development
March 2012
154 pages
ISBN:9781450312622
DOI:10.1145/2160601
General Chairs:
Ed Cutrell
Microsoft Research India
,
Ellen W. Zegura
Georgia Institute of Technology
,
Program Chairs:
Gaetano Borriello
University of Washington
,
Bill Thies
Microsoft Research India
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
ACM DEV '12 Paper Acceptance Rate14of35submissions,40%Overall Acceptance Rate52of164submissions,32%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 294
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Short message communications: users, topics, and in-language processing

ACM DEV '12: Proceedings of the 2nd ACM Symposium on Computing for Development

ABSTRACT

References

Cited By

Recommendations

Crisis Communications: A Casebook Approach

Short paper: annotating microblog posts with sensor data for emergency reporting applications

Using Twitter's Mentions for Efficient Emergency Message Propagation