ABSTRACT
Rapid response to a health epidemic is critical to reduce loss of life. Existing methods mostly rely on expensive surveys of hospitals across the country, typically with lag times of one to two weeks for influenza reporting, and even longer for less common diseases. In response, there have been several recently proposed solutions to estimate a population's health from Internet activity, most notably Google's Flu Trends service, which correlates search term frequency with influenza statistics reported by the Centers for Disease Control and Prevention (CDC). In this paper, we analyze messages posted on the micro-blogging site Twitter.com to determine if a similar correlation can be uncovered. We propose several methods to identify influenza-related messages and compare a number of regression models to correlate these messages with CDC statistics. Using over 500,000 messages spanning 10 weeks, we find that our best model achieves a correlation of .78 with CDC statistics by leveraging a document classifier to identify relevant messages.
- J. Brownstein, C. Freifeld, B. Reis, and K. Mandl. Surveillance sans frontieres: Internet-based emerging infectious disease intelligence and the HealthMap project. PLoS Medicine, 5:1019--1024, 2008.Google ScholarCross Ref
- M. Chau and J. Xu. Mining communities and their relationships in blogs: A study of online hate groups. Int. J. Hum.-Comput. Stud., 65(1):57--70, 2007. Google ScholarDigital Library
- N. Collier, S. Doan, A. Kawazeo, R. Goodwin, M. Conway, Y. Tateno, H.-Q. Ngo, D. Dien, A. Kawtrakul, K. Takeuchi, M. Shigematsu, and K. Taniguchi. BioCaster: detecting public health rumors with a web-based text mining system. Bioinformatics, 2008. Google ScholarDigital Library
- M. De Choudhury, H. Sundaram, A. John, and D. D. Seligmann. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 55--60, 2008. Google ScholarDigital Library
- E. de Quincey and P. Kostkova. Early warning and outbreak detection using social networking websites: the potential of twitter, electronic healthcare. In eHealth 2nd International Conference, Instanbul, Tirkey, September 2009.Google Scholar
- G. Eysenbach. Infodemiology: tracking flu-related searches on the web for syndromic surveillance. In AMIA: Annual symposium proceedings, pages 244--248, 2006.Google Scholar
- D. Giampiccolo, B. Magnini, I. Dagan, and B. Dolan. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 1--9, Prague, June 2007. Google ScholarDigital Library
- J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457, February 2009.Google Scholar
- R. Grishman, S. Huttunen, and R. Yangarber. Information extraction for enhanced access to disease outbreak reports. Journal of Biomedical Informatics, 35(4):236--246, 2002. Google ScholarDigital Library
- D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 78--87, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In Proc. 10th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2004. Google ScholarDigital Library
- H. Johnson, M. Wagner, W. Hogan, W. Chapman, R. Olszewski, J. Dowling, and G. Barnas. Analysis of web access logs for surveillance of influenza. MEDINFO, pages 1202--1206, 2004.Google Scholar
- J. Kessler and N. Nicolov. Targeting sentiment expressions through supervised ranking of linguistic configurations. In 3rd Int'l AAAI Conference on Weblogs and Social Media, San Jose, CA, May 2009.Google Scholar
- S. Kim and E. Hovy. Extracting opinions, opinion holders, and topics expressed in online news media text. In ACL Workshop on Sentiment and Subjectivity in Text, 2006. Google ScholarDigital Library
- J. Linge, R. Steinberger, T. Weber, R. Yangarber, E. van der Goot, D. Khudhairy, and N. Stilianakis. Internet surveillance systems for early alerting of health threats. Eurosurveillance, 14(13), 2009.Google Scholar
- D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Math. Programming, 45(3, (Ser. B)):503--528, 1989.Google Scholar
- Y. Liu, X. Huang, A. An, and X. Yu. ARSA: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. Google ScholarDigital Library
- A. Mawudeku and M. Blench. Global public health intelligence network (GPHIN). In 7th Conference of the Association for Machine Translation in the Americas, 2006.Google Scholar
- G. Mishne, K. Balog, M. de Rijke, and B. Ernsting. MoodViews: Tracking and searching mood-annotated blog posts. In International Conference on Weblogs and Social Media, Boulder, CO, 2007.Google Scholar
- B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From Tweets to polls: Linking text sentiment to public opinion time series. In International AAAI Conference on Weblogs and Social Media, Washington, D.C., 2010.Google Scholar
- B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1--2):1--135, 2008. Google ScholarDigital Library
- P. Polgreen, Y. Chen, D. Pennock, and N. Forrest. Using internet searches for influenza surveillance. Clinical infectious diseases, 47:1443--1448, 2008.Google Scholar
- A. Reilly, E. Iarocci, C. Jung, D. Hartley, and N. Nelson. Indications and warning of pandemic influenza compared to seasonal influenza. Advances in Disease Surveillance, 5(190), 2008.Google Scholar
- J. Ritterman, M. Osborne, and E. Klein. Using prediction markets and Twitter to predict a swine flu pandemic. In 1st International Workshop on Mining Social Media, 2009.Google Scholar
- R. Stross. When history is compiled 140 characters at a time. New York Times, April 2010.Google Scholar
- A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In International AAAI Conference on Weblogs and Social Media, Washington, D.C., 2010.Google Scholar
- I. Weber and C. Castillo. The demographics of web search. In Proceedings of the 33th annual international ACM SIGIR conference on Research and development in information retrieval, 2010. Google ScholarDigital Library
Index Terms
- Towards detecting influenza epidemics by analyzing Twitter messages
Recommendations
Mining Twitter data for influenza detection and surveillance
SEHS '16: Proceedings of the International Workshop on Software Engineering in Healthcare SystemsTwitter --- a social media platform --- has gained phenomenal popularity among researchers who have explored its massive volumes of data to offer meaningful insights into many aspects of modern life. Twitter has also drawn great interest from public ...
Twitter Informatics: Tracking and Understanding Public Reaction during the 2009 Swine Flu Pandemic
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Much attention has been focused on Twitter because it serves as a central hub for the publishing, dissemination, and discovery of online media. This is true for both traditional news outlets and user generated content, both of which can vary widely in ...
Real-Time Monitoring of Flu Epidemics through Linguistic and Statistical Analysis of Twitter Messages
SMAP '14: Proceedings of the 2014 9th International Workshop on Semantic and Social Media Adaptation and PersonalizationThe recent rise in popularity of Twitter and its open API provides developers the opportunity to extract amounts of data which can be a thesaurus of information. This opportunity led to the development of an open source and open API system called Flu ...
Comments