ABSTRACT
With the recent rise in popularity and scale of social media, a growing need exists for systems that can extract useful information from huge amounts of data. We address the issue of detecting influenza epidemics. First, the proposed system extracts influenza related tweets using Twitter API. Then, only tweets that mention actual influenza patients are extracted by the support vector machine (SVM) based classifier. The experiment results demonstrate the feasibility of the proposed approach (0.89 correlation to the gold standard). Especially at the outbreak and early spread (early epidemic stage), the proposed method shows high correlation (0.97 correlation), which outperforms the state-of-the-art methods. This paper describes that Twitter texts reflect the real world, and that NLP techniques can be applied to extract only tweets that contain useful information.
- Barbosa, L. and J. Feng. 2010. Robust Sentiment Detection on Twitter from Biased and Noisy Data. In Proc. 23rd Intl. Conf. on Computational Linguistics (COLING). Google ScholarDigital Library
- Boyd, D., S. Golder, and G. Lotan. 2010. Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter. In Proc. HICSS43. Google ScholarDigital Library
- Breiman L. Random Forests. 2001. Machine learning, 45(1): 5--32. Google ScholarDigital Library
- Breiman, L. Bagging predictors. 1996. Machine learning, 24(2):123--140. Google ScholarDigital Library
- Cortes C. and V. Vapnik. 1995. Support vector networks. In Machine Learning, pp. 273--297. Google ScholarDigital Library
- Chapman, W., W. Bridewell, P. Hanbury, G. F. Cooper, and B. Buchanan. 2001. A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics, 5:301--310.Google ScholarCross Ref
- Chapman, W., J. Dowling, and D. Chu. 2007. ConText: An algorithm for identifying contextual features from clinical text. Biological, translational, and clinical language processing (BioNLP2007), pp. 81--88. Google ScholarDigital Library
- Elkin, P. L., S. H. Brown, B. A. Bauer, C. S. Husser, W. Carruth, L. R. Bergstrom, and D. L. Wahner-Roedler. 2005. A controlled trial of automated classification of negation from clinical notes. BMC Medical Informatics and Decision Making 5:13.Google ScholarCross Ref
- Espino, J., W. Hogan, and M. Wagner. 2003. Telephone triage: A timely data source for surveillance of influenza-like diseases. In Proc. of Annual Symposium of AMIA, pp. 215--219.Google Scholar
- Finin, T., W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. 2010. Annotating named entities in Twitter data with crowdsourcing. In Proc. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk (CSLDAMT '10), pp. 80--88. Google ScholarDigital Library
- Freund, Y. and R. Schapire. 1996. Experiments with a new boosting algorithm. In Machine Learning Intl. Workshop, pp.148--156.Google Scholar
- Ginsberg, J., M. H. Mohebbi, R. S. Patel, and L. Brammer. 2009. Detecting influenza epidemics using search engine query data, Nature Vol. 457 (19).Google Scholar
- Huang, Y. and H. J. Lowe. 2007. A novel hybrid approach to automated negation detection in clinical radiology reports. Journal of the American Medical Informatics Association, 14(3):304--311.Google ScholarCross Ref
- Huberman, B. and D. R. F. Wu. 2009. Social networks that matter: Twitter under the microscope. First Monday, Vol. 14.Google Scholar
- Hulth, A., G. Rydevik, and A. Linde. 2009. Web Queries as a Source for Syndromic Surveillance. PLoS ONE 4(2).Google Scholar
- Johnson, HA., MM. Wagner, WR. Hogan, W. Chapman, RT. Olszewski, J. Dowling, and G. Barnas. 2004. Analysis of Web access logs for surveillance of influenza. Stud. Health Technol. Inform. 107(Pt 2):1202--1206.Google Scholar
- Magruder, S. 2003. Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins University APL Technical Digest 24:349--353.Google Scholar
- Milstein, S., A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. 2008. Twitter and the micromessaging revolution: Communication, connections, and immediacy, 140 characters at a time. O'Reilly Media.Google Scholar
- Mutalik, P. G., A. Deshpande, and P. M. Nadkarni. 2001. Use of general purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. Journal of the American Medical Informatics Association, 8(6):598--609.Google ScholarCross Ref
- Paul, MJ. and M. Dredze. 2011. You Are What You Tweet: Analyzing Twitter for Public Health. In Proc. of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM).Google Scholar
- Polgreen, PM., Y. Chen, D. M. Pennock, and F. D. Nelson. 2008. Using Internet Searches for Influenza Surveillance, Clinical Infectious Diseases Vol. 47 (11) pp. 1443--1448.Google ScholarCross Ref
- Quinlan. J. 1993. C4. 5: programs for machine learning. Morgan Kaufmann. Google ScholarDigital Library
- Sakaki, T., M. Okazaki, and Y. Matsuo. 2010. Earthquake shakes Twitter users: real-time event detection by social sensors, in Proc. of Conf. on World Wide Web (WWW). Google ScholarDigital Library
- Twitter catches the flu: detecting influenza epidemics using Twitter
Recommendations
Flu detector - tracking epidemics on twitter
ECMLPKDD'10: Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IIIWe present an automated tool with a web interface for tracking the prevalence of Influenza-like Illness (ILI) in several regions of the United Kingdom using the contents of Twitter's microblogging service. Our data is comprised by a daily average of ...
Information resonance on Twitter: watching Iran
SOMA '10: Proceedings of the First Workshop on Social Media AnalyticsTwitter has undoubtedly caught the attention of both the general public, and academia as a microblogging service worthy of study and attention. Twitter has several features that sets it apart from other social media/networking sites, including its 140 ...
A sentiment analysis of audiences on twitter: who is the positive or negative audience of popular twitterers?
ICHIT'11: Proceedings of the 5th international conference on Convergence and hybrid information technologyMicroblogging is a new informal communication medium of blogging that differs from a traditional blog in which content is much shorter. Microbloggers post about topics that describe their current status. Twitter is a popular microblogging service and ...
Comments