ABSTRACT
This paper investigates three dimensions of cross-domain analysis for humanitarian information processing: citizen reporting vs organizational reporting; Twitter vs SMS; and English vs non-English communications. Short messages sent during the response to the recent earthquake in Haiti and floods in Pakistan are analyzed. It is clear that SMS and Twitter were used very differently at the time, by different groups of people. SMS was primarily used by individuals on the ground while Twitter was primarily used by the international community. Turning to semi-automated strategies that employ natural language processing, it is found that English-optimal strategies do not carry over to Urdu or Kreyol, especially with regards to subword variation. Looking at machine-learning models that attempt to combine both Twitter and SMS, it is found that the cross-domain prediction accuracy is very poor, but some loss in accuracy can be overcome by learning prior distributions over the sources. It is concluded that there is only limited utility in treating SMS and Twitter as equivalent information sources -- perhaps much less than the relatively large number of recent Twitter-focused papers would indicate.
- P. P. Alexander Pak. Twitter as a corpus for sentiment analysis and opinion mining. In Proceeding of the 2010 International Conference on Language Resources and Evaluation (LREC 2010), 2010.Google Scholar
- R. Beaufort, S. Roekhaut, L. Cougnon, and C. Fairon. A hybrid rule/model-based finite-state framework for normalizing sms messages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 770--779, 2010. Google ScholarDigital Library
- C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen. Classifying text messages for the haiti earthquake. In Proceedings of the 8th International Conference on Information Systems for Crisis Response and Management (ISCRAM2011), Lisbon, Portugal, 2011.Google Scholar
- M. Choudhury, R. Saraf, V. Jain, A. Mukherjee, S. Sarkar, and A. Basu. Investigation and modeling of the structure of texting language. International Journal on Document Analysis and Recognition, 10(3):157--174, 2007. Google ScholarDigital Library
- P. Cook and S. Stevenson. An unsupervised model for text message normalization. In Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pages 71--78. Association for Computational Linguistics, 2009. Google ScholarDigital Library
- G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz. Feature engineering for mobile (SMS) spam filtering. In The 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007. Google ScholarDigital Library
- S. Garrett. Big goals, big game, big records. In Twitter Blog (http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html), 2010.Google Scholar
- S. Goldwater, T. L. Griffiths, and M. Johnson. A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1):21--54, 2009.Google ScholarCross Ref
- M. Healy, S. J. Delany, and A. Zamolotskikh. An assessment of case-based reasoning for Short Text Message Classification. In The 16th Irish Conference on Artificial Intelligence & Cognitive Science, 2005.Google Scholar
- J. M. G. Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García. Content based SMS spam filtering. In ACM symposium on Document engineering, 2006. Google ScholarDigital Library
- N. Hodge. Texts, Tweets Saving Haitians from the Rubble. Wired Magazine, 2010.Google Scholar
- S. Isbrandt. Cell Phones in West Africa: improving literacy and agricultural market information systems in Niger. White paper: Projet Alphabétisation de Base par Cellulaire, 2009.Google Scholar
- ITU. The world in 2010 - the rise of 3G. In International Telecommunication Union, 2011.Google Scholar
- A. Jagun, R. Heeks, and J. Whalley. The impact of mobile telephony on developing country micro-enterprise: A Nigerian case study. Information Technologies and International Development, 4, 2008. Google ScholarDigital Library
- C. Kobus, F. Yvon, and G. Damnati. Normalizing SMS: are two metaphors better than one? In The 22nd International Conference on Computational Linguistics, 2008. Google ScholarDigital Library
- C. Leach-Lemens. Using mobile phones in HIV care and prevention. HIV and AIDS Treatment in Practice, 137, 2009.Google Scholar
- W. Lewis. Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes. In 14th Annual Conference of the European Association for Machine Translation, 2010.Google Scholar
- W. Lewis, R. Munro, and S. Vogel. Crisis MT: Developing A Cookbook for MT in Crisis Situations. In Annual Workshop on Machine Translation, EMNLP, Edinburgh, 2011. Google ScholarDigital Library
- F. Liu, F. Weng, B. Wang, and Y. Liu. Insertion, deletion, or substitution? normalizing text messages without pre-categorization nor supervision. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011. Google ScholarDigital Library
- R. Munro. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In AMTA Workshop on Collaborative Crowdsourcing for Translation, 2010.Google Scholar
- R. Munro. Subword and spatiotemporal models for identifying actionable information in haitian kreyol. In Fifteenth Conference on Natural Language Learning (CoNLL), Portland, OR, 2011. Google ScholarDigital Library
- R. Munro and C. D. Manning. Subword variation in text message classification. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), Los Angeles, CA, 2010. Google ScholarDigital Library
- B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010.Google Scholar
- G. Peevers, G. Douglas, and M. A. Jack. A usability comparison of three alternative message formats for an SMS banking service. International Journal of Human-Computer Studies, 66, 2008. Google ScholarDigital Library
- D. Pennell and Y. Liu. Normalization of text messages for text-to-speech. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 4842--4845. IEEE, 2010.Google ScholarCross Ref
- K. Peterson, M. Hohensee, and F. Xia. Email formality in the workplace: A case study on the enron corpus. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics ACL 2011, page 86, 2011. Google ScholarDigital Library
- S. Petrović, M. Osborne, and V. Lavrenko. Streaming first story detection with application to twitter. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), 2010. Google ScholarDigital Library
- Pingdom. Internet 2010 in numberse. In Royal Pingdom Blog (http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/), 2011.Google Scholar
- B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, 2010. Google ScholarDigital Library
- K. Starbird and L. Palen. Voluntweeters: Self-organizing by digital volunteers in times of crisis. In ACM CHI Conference on Human Factors in Computing Systems, Vancouver, CA, 2011. Google ScholarDigital Library
- K. Starbird and J. Stamberger. Tweak the tweet: Leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting. In Proceedings of the seventh international ISCRAM Conference. ACM, 2010.Google Scholar
- K. Starbird and J. Stamberger. Tweak the Tweet: Leveraging Microblogging Proliferation with a Prescriptive Syntax to Support Citizen Reporting. In Proceedings of the 7th International ISCRAM Conference, 2010.Google Scholar
- S. Stenner, K. Johnson, and J. Denny. Paste: patient-centered sms text tagging in a medication management system. Journal of the American Medical Informatics Association, 2011.Google Scholar
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems, 17, 2005.Google Scholar
- S. Vieweg, A. Hughes, K. Starbird, and L. Palen. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the 28th international conference on Human factors in computing systems, pages 1079--1088. ACM, 2010. Google ScholarDigital Library
- Z. Xue, D. Yin, and B. D. Davison. Normalizing microtext. In Proceedings of the AAAI Workshop on Analyzing Microtext, 2011.Google ScholarDigital Library
- Short message communications: users, topics, and in-language processing
Recommendations
Short paper: annotating microblog posts with sensor data for emergency reporting applications
SSN'11: Proceedings of the 4th International Conference on Semantic Sensor Networks - Volume 839The explosion in user-generated content (on the Social Web) published from mobile devices has seen microblog platforms like Twitter grow exponentially. Twitter is a microblogging platform founded in 2006, which by October 2010 had roughly 175m users and ...
Using Twitter's Mentions for Efficient Emergency Message Propagation
ARES '13: Proceedings of the 2013 International Conference on Availability, Reliability and SecurityUsing social media such as Twitter for emergency message propagation in times of crisis is widely thought to be a good addition to other traditional emergency population warning systems such as televisions. At the same time, most studies on Twitter ...
Comments