skip to main content
10.5555/2145432.2145600dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free Access

Twitter catches the flu: detecting influenza epidemics using Twitter

Published:27 July 2011Publication History

ABSTRACT

With the recent rise in popularity and scale of social media, a growing need exists for systems that can extract useful information from huge amounts of data. We address the issue of detecting influenza epidemics. First, the proposed system extracts influenza related tweets using Twitter API. Then, only tweets that mention actual influenza patients are extracted by the support vector machine (SVM) based classifier. The experiment results demonstrate the feasibility of the proposed approach (0.89 correlation to the gold standard). Especially at the outbreak and early spread (early epidemic stage), the proposed method shows high correlation (0.97 correlation), which outperforms the state-of-the-art methods. This paper describes that Twitter texts reflect the real world, and that NLP techniques can be applied to extract only tweets that contain useful information.

References

  1. Barbosa, L. and J. Feng. 2010. Robust Sentiment Detection on Twitter from Biased and Noisy Data. In Proc. 23rd Intl. Conf. on Computational Linguistics (COLING). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boyd, D., S. Golder, and G. Lotan. 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter. In Proc. HICSS43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Breiman L. Random Forests. 2001. Machine learning, 45(1): 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Breiman, L. Bagging predictors. 1996. Machine learning, 24(2):123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cortes C. and V. Vapnik. 1995. Support vector networks. In Machine Learning, pp. 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chapman, W., W. Bridewell, P. Hanbury, G. F. Cooper, and B. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 5:301--310.Google ScholarGoogle ScholarCross RefCross Ref
  7. Chapman, W., J. Dowling, and D. Chu. 2007. ConText: An algorithm for identifying contextual features from clinical text. Biological, translational, and clinical language processing (BioNLP2007), pp. 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Elkin, P. L., S. H. Brown, B. A. Bauer, C. S. Husser, W. Carruth, L. R. Bergstrom, and D. L. Wahner-Roedler. 2005. A controlled trial of automated classification of negation from clinical notes. BMC Medical Informatics and Decision Making 5:13.Google ScholarGoogle ScholarCross RefCross Ref
  9. Espino, J., W. Hogan, and M. Wagner. 2003. Telephone triage: A timely data source for surveillance of influenza-like diseases. In Proc. of Annual Symposium of AMIA, pp. 215--219.Google ScholarGoogle Scholar
  10. Finin, T., W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. 2010. Annotating named entities in Twitter data with crowdsourcing. In Proc. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10), pp. 80--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Freund, Y. and R. Schapire. 1996. Experiments with a new boosting algorithm. In Machine Learning Intl. Workshop, pp.148--156.Google ScholarGoogle Scholar
  12. Ginsberg, J., M. H. Mohebbi, R. S. Patel, and L. Brammer. 2009. Detecting influenza epidemics using search engine query data, Nature Vol. 457 (19).Google ScholarGoogle Scholar
  13. Huang, Y. and H. J. Lowe. 2007. A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3):304--311.Google ScholarGoogle ScholarCross RefCross Ref
  14. Huberman, B. and D. R. F. Wu. 2009. Social networks that matter: Twitter under the microscope. First Monday, Vol. 14.Google ScholarGoogle Scholar
  15. Hulth, A., G. Rydevik, and A. Linde. 2009. Web Queries as a Source for Syndromic Surveillance. PLoS ONE 4(2).Google ScholarGoogle Scholar
  16. Johnson, HA., MM. Wagner, WR. Hogan, W. Chapman, RT. Olszewski, J. Dowling, and G. Barnas. 2004. Analysis of Web access logs for surveillance of influenza. Stud. Health Technol. Inform. 107(Pt 2):1202--1206.Google ScholarGoogle Scholar
  17. Magruder, S. 2003. Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins University APL Technical Digest 24:349--353.Google ScholarGoogle Scholar
  18. Milstein, S., A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. 2008. Twitter and the micromessaging revolution: Communication, connections, and immediacy, 140 characters at a time. O'Reilly Media.Google ScholarGoogle Scholar
  19. Mutalik, P. G., A. Deshpande, and P. M. Nadkarni. 2001. Use of general purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Informatics Association, 8(6):598--609.Google ScholarGoogle ScholarCross RefCross Ref
  20. Paul, MJ. and M. Dredze. 2011. You Are What You Tweet: Analyzing Twitter for Public Health. In Proc. of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM).Google ScholarGoogle Scholar
  21. Polgreen, PM., Y. Chen, D. M. Pennock, and F. D. Nelson. 2008. Using Internet Searches for Influenza Surveillance, Clinical Infectious Diseases Vol. 47 (11) pp. 1443--1448.Google ScholarGoogle ScholarCross RefCross Ref
  22. Quinlan. J. 1993. C4. 5: programs for machine learning. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sakaki, T., M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors, in Proc. of Conf. on World Wide Web (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Twitter catches the flu: detecting influenza epidemics using Twitter

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image DL Hosted proceedings
      EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing
      July 2011
      1647 pages
      ISBN:9781937284114

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      • Published: 27 July 2011

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate73of234submissions,31%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader