skip to main content
10.1145/1964858.1964874acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Towards detecting influenza epidemics by analyzing Twitter messages

Published:25 July 2010Publication History

ABSTRACT

Rapid response to a health epidemic is critical to reduce loss of life. Existing methods mostly rely on expensive surveys of hospitals across the country, typically with lag times of one to two weeks for influenza reporting, and even longer for less common diseases. In response, there have been several recently proposed solutions to estimate a population's health from Internet activity, most notably Google's Flu Trends service, which correlates search term frequency with influenza statistics reported by the Centers for Disease Control and Prevention (CDC). In this paper, we analyze messages posted on the micro-blogging site Twitter.com to determine if a similar correlation can be uncovered. We propose several methods to identify influenza-related messages and compare a number of regression models to correlate these messages with CDC statistics. Using over 500,000 messages spanning 10 weeks, we find that our best model achieves a correlation of .78 with CDC statistics by leveraging a document classifier to identify relevant messages.

References

  1. J. Brownstein, C. Freifeld, B. Reis, and K. Mandl. Surveillance sans frontieres: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Medicine, 5:1019--1024, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Chau and J. Xu. Mining communities and their relationships in blogs: A study of online hate groups. Int. J. Hum.-Comput. Stud., 65(1):57--70, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Collier, S. Doan, A. Kawazeo, R. Goodwin, M. Conway, Y. Tateno, H.-Q. Ngo, D. Dien, A. Kawtrakul, K. Takeuchi, M. Shigematsu, and K. Taniguchi. BioCaster: detecting public health rumors with a web-based text mining system. Bioinformatics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 55--60, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. de Quincey and P. Kostkova. Early warning and outbreak detection using social networking websites: the potential of twitter, electronic healthcare. In eHealth 2nd International Conference, Instanbul, Tirkey, September 2009.Google ScholarGoogle Scholar
  6. G. Eysenbach. Infodemiology: tracking flu-related searches on the web for syndromic surveillance. In AMIA: Annual symposium proceedings, pages 244--248, 2006.Google ScholarGoogle Scholar
  7. D. Giampiccolo, B. Magnini, I. Dagan, and B. Dolan. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1--9, Prague, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457, February 2009.Google ScholarGoogle Scholar
  9. R. Grishman, S. Huttunen, and R. Yangarber. Information extraction for enhanced access to disease outbreak reports. Journal of Biomedical Informatics, 35(4):236--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 78--87, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hu and B. Liu. Mining and summarizing customer reviews. In Proc. 10th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Johnson, M. Wagner, W. Hogan, W. Chapman, R. Olszewski, J. Dowling, and G. Barnas. Analysis of web access logs for surveillance of influenza. MEDINFO, pages 1202--1206, 2004.Google ScholarGoogle Scholar
  13. J. Kessler and N. Nicolov. Targeting sentiment expressions through supervised ranking of linguistic configurations. In 3rd Int'l AAAI Conference on Weblogs and Social Media, San Jose, CA, May 2009.Google ScholarGoogle Scholar
  14. S. Kim and E. Hovy. Extracting opinions, opinion holders, and topics expressed in online news media text. In ACL Workshop on Sentiment and Subjectivity in Text, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Linge, R. Steinberger, T. Weber, R. Yangarber, E. van der Goot, D. Khudhairy, and N. Stilianakis. Internet surveillance systems for early alerting of health threats. Eurosurveillance, 14(13), 2009.Google ScholarGoogle Scholar
  16. D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Programming, 45(3, (Ser. B)):503--528, 1989.Google ScholarGoogle Scholar
  17. Y. Liu, X. Huang, A. An, and X. Yu. ARSA: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Mawudeku and M. Blench. Global public health intelligence network (GPHIN). In 7th Conference of the Association for Machine Translation in the Americas, 2006.Google ScholarGoogle Scholar
  19. G. Mishne, K. Balog, M. de Rijke, and B. Ernsting. MoodViews: Tracking and searching mood-annotated blog posts. In International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google ScholarGoogle Scholar
  20. B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From Tweets to polls: Linking text sentiment to public opinion time series. In International AAAI Conference on Weblogs and Social Media, Washington, D.C., 2010.Google ScholarGoogle Scholar
  21. B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1--2):1--135, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Polgreen, Y. Chen, D. Pennock, and N. Forrest. Using internet searches for influenza surveillance. Clinical infectious diseases, 47:1443--1448, 2008.Google ScholarGoogle Scholar
  23. A. Reilly, E. Iarocci, C. Jung, D. Hartley, and N. Nelson. Indications and warning of pandemic influenza compared to seasonal influenza. Advances in Disease Surveillance, 5(190), 2008.Google ScholarGoogle Scholar
  24. J. Ritterman, M. Osborne, and E. Klein. Using prediction markets and Twitter to predict a swine flu pandemic. In 1st International Workshop on Mining Social Media, 2009.Google ScholarGoogle Scholar
  25. R. Stross. When history is compiled 140 characters at a time. New York Times, April 2010.Google ScholarGoogle Scholar
  26. A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In International AAAI Conference on Weblogs and Social Media, Washington, D.C., 2010.Google ScholarGoogle Scholar
  27. I. Weber and C. Castillo. The demographics of web search. In Proceedings of the 33th annual international ACM SIGIR conference on Research and development in information retrieval, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards detecting influenza epidemics by analyzing Twitter messages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SOMA '10: Proceedings of the First Workshop on Social Media Analytics
      July 2010
      145 pages
      ISBN:9781450302173
      DOI:10.1145/1964858

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader