skip to main content
10.1145/2160601.2160607acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-devConference Proceedingsconference-collections
research-article

Short message communications: users, topics, and in-language processing

Published:11 March 2012Publication History

ABSTRACT

This paper investigates three dimensions of cross-domain analysis for humanitarian information processing: citizen reporting vs organizational reporting; Twitter vs SMS; and English vs non-English communications. Short messages sent during the response to the recent earthquake in Haiti and floods in Pakistan are analyzed. It is clear that SMS and Twitter were used very differently at the time, by different groups of people. SMS was primarily used by individuals on the ground while Twitter was primarily used by the international community. Turning to semi-automated strategies that employ natural language processing, it is found that English-optimal strategies do not carry over to Urdu or Kreyol, especially with regards to subword variation. Looking at machine-learning models that attempt to combine both Twitter and SMS, it is found that the cross-domain prediction accuracy is very poor, but some loss in accuracy can be overcome by learning prior distributions over the sources. It is concluded that there is only limited utility in treating SMS and Twitter as equivalent information sources -- perhaps much less than the relatively large number of recent Twitter-focused papers would indicate.

References

  1. P. P. Alexander Pak. Twitter as a corpus for sentiment analysis and opinion mining. In Proceeding of the 2010 International Conference on Language Resources and Evaluation (LREC 2010), 2010.Google ScholarGoogle Scholar
  2. R. Beaufort, S. Roekhaut, L. Cougnon, and C. Fairon. A hybrid rule/model-based finite-state framework for normalizing sms messages. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 770--779, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Caragea, N. McNeese, A. Jaiswal, G. Traylor, H.-W. Kim, P. Mitra, D. Wu, A. H. Tapia, L. Giles, B. J. Jansen, and J. Yen. Classifying text messages for the haiti earthquake. In Proceedings of the 8th International Conference on Information Systems for Crisis Response and Management (ISCRAM2011), Lisbon, Portugal, 2011.Google ScholarGoogle Scholar
  4. M. Choudhury, R. Saraf, V. Jain, A. Mukherjee, S. Sarkar, and A. Basu. Investigation and modeling of the structure of texting language. International Journal on Document Analysis and Recognition, 10(3):157--174, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Cook and S. Stevenson. An unsupervised model for text message normalization. In Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pages 71--78. Association for Computational Linguistics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz. Feature engineering for mobile (SMS) spam filtering. In The 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Garrett. Big goals, big game, big records. In Twitter Blog (http://blog.twitter.com/2010/06/big-goals-big-game-big-records.html), 2010.Google ScholarGoogle Scholar
  8. S. Goldwater, T. L. Griffiths, and M. Johnson. A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1):21--54, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Healy, S. J. Delany, and A. Zamolotskikh. An assessment of case-based reasoning for Short Text Message Classification. In The 16th Irish Conference on Artificial Intelligence & Cognitive Science, 2005.Google ScholarGoogle Scholar
  10. J. M. G. Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García. Content based SMS spam filtering. In ACM symposium on Document engineering, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Hodge. Texts, Tweets Saving Haitians from the Rubble. Wired Magazine, 2010.Google ScholarGoogle Scholar
  12. S. Isbrandt. Cell Phones in West Africa: improving literacy and agricultural market information systems in Niger. White paper: Projet Alphabétisation de Base par Cellulaire, 2009.Google ScholarGoogle Scholar
  13. ITU. The world in 2010 - the rise of 3G. In International Telecommunication Union, 2011.Google ScholarGoogle Scholar
  14. A. Jagun, R. Heeks, and J. Whalley. The impact of mobile telephony on developing country micro-enterprise: A Nigerian case study. Information Technologies and International Development, 4, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Kobus, F. Yvon, and G. Damnati. Normalizing SMS: are two metaphors better than one? In The 22nd International Conference on Computational Linguistics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Leach-Lemens. Using mobile phones in HIV care and prevention. HIV and AIDS Treatment in Practice, 137, 2009.Google ScholarGoogle Scholar
  17. W. Lewis. Haitian Creole: How to Build and Ship an MT Engine from Scratch in 4 days, 17 hours, & 30 minutes. In 14th Annual Conference of the European Association for Machine Translation, 2010.Google ScholarGoogle Scholar
  18. W. Lewis, R. Munro, and S. Vogel. Crisis MT: Developing A Cookbook for MT in Crisis Situations. In Annual Workshop on Machine Translation, EMNLP, Edinburgh, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Liu, F. Weng, B. Wang, and Y. Liu. Insertion, deletion, or substitution? normalizing text messages without pre-categorization nor supervision. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Munro. Crowdsourced translation for emergency response in Haiti: the global collaboration of local knowledge. In AMTA Workshop on Collaborative Crowdsourcing for Translation, 2010.Google ScholarGoogle Scholar
  21. R. Munro. Subword and spatiotemporal models for identifying actionable information in haitian kreyol. In Fifteenth Conference on Natural Language Learning (CoNLL), Portland, OR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Munro and C. D. Manning. Subword variation in text message classification. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), Los Angeles, CA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 2010.Google ScholarGoogle Scholar
  24. G. Peevers, G. Douglas, and M. A. Jack. A usability comparison of three alternative message formats for an SMS banking service. International Journal of Human-Computer Studies, 66, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Pennell and Y. Liu. Normalization of text messages for text-to-speech. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 4842--4845. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. K. Peterson, M. Hohensee, and F. Xia. Email formality in the workplace: A case study on the enron corpus. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics ACL 2011, page 86, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Petrović, M. Osborne, and V. Lavrenko. Streaming first story detection with application to twitter. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2010), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pingdom. Internet 2010 in numberse. In Royal Pingdom Blog (http://royal.pingdom.com/2011/01/12/internet-2010-in-numbers/), 2011.Google ScholarGoogle Scholar
  29. B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Starbird and L. Palen. Voluntweeters: Self-organizing by digital volunteers in times of crisis. In ACM CHI Conference on Human Factors in Computing Systems, Vancouver, CA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Starbird and J. Stamberger. Tweak the tweet: Leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting. In Proceedings of the seventh international ISCRAM Conference. ACM, 2010.Google ScholarGoogle Scholar
  32. K. Starbird and J. Stamberger. Tweak the Tweet: Leveraging Microblogging Proliferation with a Prescriptive Syntax to Support Citizen Reporting. In Proceedings of the 7th International ISCRAM Conference, 2010.Google ScholarGoogle Scholar
  33. S. Stenner, K. Johnson, and J. Denny. Paste: patient-centered sms text tagging in a medication management system. Journal of the American Medical Informatics Association, 2011.Google ScholarGoogle Scholar
  34. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. In Advances in Neural Information Processing Systems, 17, 2005.Google ScholarGoogle Scholar
  35. S. Vieweg, A. Hughes, K. Starbird, and L. Palen. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the 28th international conference on Human factors in computing systems, pages 1079--1088. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Xue, D. Yin, and B. D. Davison. Normalizing microtext. In Proceedings of the AAAI Workshop on Analyzing Microtext, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Short message communications: users, topics, and in-language processing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ACM DEV '12: Proceedings of the 2nd ACM Symposium on Computing for Development
      March 2012
      154 pages
      ISBN:9781450312622
      DOI:10.1145/2160601

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 March 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ACM DEV '12 Paper Acceptance Rate14of35submissions,40%Overall Acceptance Rate52of164submissions,32%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader