skip to main content
10.1145/2998181.2998183acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article
Open Access

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda

Published:25 February 2017Publication History

ABSTRACT

Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of when and how these approaches work remains open. We addressed this question using Wikipedia access logs and category links. Our experiments, replicable and extensible using our open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. We found that our minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that our approach is relatively insensitive to the amount and age of training data. We also found, in contrast to prior work, very little forecasting value, and we argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.

References

  1. Harshavardhan Achrekar et al. 2011. Predicting flu trends using Twitter data. In Computer Communications Workshops (INFOCOM Workshops)).Google ScholarGoogle Scholar
  2. Harshavardhan Achrekar et al. 2012. Twitter improves seasonal influenza prediction. In Health Informatics (HEALTHINF). http://www.cs.uml.edu/~bliu/pub/healthinf_2012.pdfGoogle ScholarGoogle Scholar
  3. Byung Gyu Ahn, Benjamin Van Durme, and Chris Callison-Burch. 2011. WikiTopics: What is popular on Wikipedia and why. In Workshop on Automatic Summarization for Different Genres, Media, and Languages (WASDGML). http://dl.acm.org/citation.cfm?id=2018987.2018992Google ScholarGoogle Scholar
  4. Murray Aitken, Thomas Altmann, and Daniel Rosen. 2014. Engaging patients through social media. Tech report. IMS Institute for Healthcare Informatics.Google ScholarGoogle Scholar
  5. Cristiano Alicino et al. Assessing Ebola-related web search behaviour: Insights and implications from an analytical study of Google Trends-based query volumes. Infectious Diseases of Poverty 4 (2015).Google ScholarGoogle Scholar
  6. Tim Althoff et al. 2013. Analysis and forecasting of trending topics in online media streams. In Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Benjamin M. Althouse, Yih Yng Ng, and Derek A. T. Cummings. Prediction of dengue incidence using 15http://colorbrewer2.org search query surveillance. PLOS Neglected Tropical Diseases 5, 8 (Aug. 2011).Google ScholarGoogle ScholarCross RefCross Ref
  8. Eiji Aramaki, Sachiko Maskawa, and Mizuki Morita. 2011. Twitter catches the flu: Detecting influenza epidemics using Twitter. In Empirical Methods in Natural Language Processing (EMNLP). http://dl. acm.org/citation.cfm?id=2145432.2145600 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ozgur M. Araz, Dan Bentley, and Robert L. Muelleman. Using Google flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska. The American Journal of Emergency Medicine 32, 9 (Sept. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  10. Anoshé A. Aslam et al. The reliability of tweets as a supplementary method of seasonal influenza surveillance. Journal of Medical Internet Research 16, (Nov. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  11. John W. Ayers et al. Seasonality in seeking mental health information on Google. American Journal of Preventive Medicine 44, 5 (May 2013).Google ScholarGoogle ScholarCross RefCross Ref
  12. Gyung Jin Bahk, Yong Soo Kim, and Myoung Su Park. Use of internet search queries to enhance surveillance of foodborne illness. Emerging Infectious Diseases 21, 11 (Nov. 2015).Google ScholarGoogle ScholarCross RefCross Ref
  13. Batuhan Bardak and Mehmet Tan. 2015. Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu Trend data. In IEEE Bioinformatics and Bioengineering (BIBE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michał Bogdziewicz and Jakub Szymkowiak. Oak acorn crop and Google search volume predict Lyme disease risk in temperate Europe. Basic and Applied Ecology (Jan. 2016).Google ScholarGoogle Scholar
  15. Stephanie M. Borchardt, Kathleen A. Ritger, and Mark S. Dworkin. Categorization, prioritization, and surveillance of potential bioterrorism agents. Infectious Disease Clinics of North America 20, 2 (June 2006).Google ScholarGoogle ScholarCross RefCross Ref
  16. Dena M. Bravata et al. Systematic review: Surveillance systems for early detection of bioterrorism-related diseases. Annals of Internal Medicine 140, 11 (June 2004).Google ScholarGoogle ScholarCross RefCross Ref
  17. Benjamin N. Breyer et al. Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the USA. Urology 78, 2 (Aug. 2011).Google ScholarGoogle ScholarCross RefCross Ref
  18. Francesco Brigo and Roberto Erro. Why do people Google movement disorders? An infodemiological study of information seeking behaviors. Neurological Sciences (Feb. 2016).Google ScholarGoogle Scholar
  19. David Andre Broniatowski et al. Using social media to perform local influenza surveillance in an inner-city hospital: A retrospective observational study. JMIR Public Health and Surveillance 1, 1 (2015).Google ScholarGoogle ScholarCross RefCross Ref
  20. David A. Broniatowski, Michael J. Paul, and Mark Dredze. National and local influenza surveillance through Twitter: An analysis of the 2012-2013 influenza epidemic. PLOS ONE 8, 12 (Dec. 2013).Google ScholarGoogle Scholar
  21. Logan C. Brooks et al. flexible modeling of epidemics with an empirical bayes framework. PLOS Computational Biology 11, 8 (Aug. 2015).Google ScholarGoogle ScholarCross RefCross Ref
  22. Matt Brooks. Was the NBA draft lottery rigged for the New Orleans Hornets to win? Washington Post (May 2012). https://www.washingtonpost.com/blogs/early-lead/post/was-the-nba-draft-lotteryrigged-for-the-new-orleans-hornets-towin/2012/05/31/gJQAmL5V4U_blog.htmlGoogle ScholarGoogle Scholar
  23. Jan Burdziej and Piotr Gawrysiak. 2012. Using web mining for discovering spatial patterns and hot spots for spatial generalization. In Foundations of Intelligent Systems, Li Chen et al. (Eds.). Number 7661. http://link.springer.com/chapter/10.1007/ 978--3--642--34624--8_21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Declan Butler. When Google got flu wrong. Nature 494, 7436 (Feb. 2013).Google ScholarGoogle ScholarCross RefCross Ref
  25. Herman Anthony Carneiro and Eleftherios Mylonakis. Google Trends: A web-based tool for real-time surveillance of disease outbreaks. Clinical Infectious Diseases 49, 10 (Nov. 2009).Google ScholarGoogle ScholarCross RefCross Ref
  26. Rachael Cayce, Kathleen Hesterman, and Paul Bergstresser. Google technology in the surveillance of hand foot mouth disease in Asia. International Journal of Integrative Pediatrics and Environmental Medicine 1 (2014). http://www.ijipem.com/index.php/ijipem/article/view/6Google ScholarGoogle Scholar
  27. Centers for Disease Control and Prevention (CDC). MMWR morbidity tables. (2015). http://wonder.cdc.gov/mmwr/mmwrmorb.aspGoogle ScholarGoogle Scholar
  28. 2016. Overview of influenza surveillance in the USA. Technical Report. Centers for Disease Control and Prevention (CDC). http://www.cdc.gov/flu/pdf/weekly/overview.pdfGoogle ScholarGoogle Scholar
  29. Boris Cergol and Matjaz Omladić. What can Wikipedia and Google tell us about stock prices under diferent market regimes? Ars Mathematica Contemporanea 9, 2 (June 2015). http://amcjournal.eu/index.php/amc/article/view/561Google ScholarGoogle ScholarCross RefCross Ref
  30. Prithwish Chakraborty et al. 2014. Forecasting a moving target: Ensemble models for ILI case count predictions. In SIAM Data Mining.Google ScholarGoogle Scholar
  31. Emily H. Chan et al. Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance. PLOS Neglected Tropical Diseases 5, 5 (May 2011).Google ScholarGoogle ScholarCross RefCross Ref
  32. Jedsada Chartree. 2014. Monitoring dengue outbreaks using online data. Ph.D. University of North Texas. http://digital.library.unt.edu/ark:/67531/ metadc500167/m2/1/high_res_d/dissertation. pdfGoogle ScholarGoogle Scholar
  33. Sungjin Cho et al. Correlation between national influenza surveillance data and Google Trends in South Korea. PLOS ONE 8, 12 (Dec. 2013).Google ScholarGoogle Scholar
  34. Rumi Chunara et al. Online reporting for malaria surveillance using micro-monetary incentives, in urban India 2010-2011. Malaria Journal 11, 1 (Feb. 2012).Google ScholarGoogle Scholar
  35. Rumi Chunara et al. flu Near You: An online self-reported influenza surveillance system in the USA. Online Journal of Public Health Informatics 5, 1 (March 2013).Google ScholarGoogle Scholar
  36. Rumi Chunara, Jason R Andrews, and John S Brownstein. Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. American Journal of Tropical Medicine and Hygiene 86, 1 (Jan. 2012).Google ScholarGoogle Scholar
  37. Marek Ciglan and Kjetil Nørvåg. 2010. WikiPop: Personalized event detection system based on Wikipedia page view statistics. In Information and Knowledge Management (CIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nigel Collier et al. BioCaster: Detecting public health rumors with a Web-based text mining system. Bioinformatics 24, 24 (Dec. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Crystale Purvis Cooper et al. Cancer internet search activity on a major search engine, USA 2001-2003. Journal of Medical Internet Research 7, 3 (July 2005).Google ScholarGoogle Scholar
  40. Aron Culotta. 2010. Towards detecting influenza epidemics by analyzing Twitter messages. In Workshop on Social Media Analytics (SOMA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Aron Culotta. Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages. Language Resources and Evaluation 47, 1 (March 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Aron Culotta. 2014. Estimating county health statistics with Twitter. In Human Factors in Computing Systems (CHI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Michael W. Davidson, Dotan A. Haim, and Jennifer M. Radin. Using networks to combine "big data" and traditional surveillance to improve influenza predictions. Scientific Reports 5 (Jan. 2015).Google ScholarGoogle Scholar
  44. Brian de Silva and Ryan Compton. Prediction of foreign box office revenues based on Wikipedia page activity. arXiv:1405.5924 {cs.SI} (May 2014). http://arxiv.org/abs/1405.5924Google ScholarGoogle Scholar
  45. Rishi Desai et al. Norovirus disease surveillance using Google internet query share data. Clinical Infectious Diseases 55, 8 (Oct. 2012).Google ScholarGoogle ScholarCross RefCross Ref
  46. Son Doan, Lucila Ohno-Machado, and Nigel Collier. 2012. Enhancing Twitter data analysis with simple semantic filtering: Example in tracking influenza-like illnesses. In Healthcare Informatics, Imaging and Systems Biology (HISB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Timothy J. Doyle, M. Kathleen Glynn, and Samuel L. Groseclose. Completeness of notifiable infectious disease reporting in the USA: An analytical literature review. American Journal of Epidemiology 155, 9 (Jan. 2002).Google ScholarGoogle ScholarCross RefCross Ref
  48. Andrea Freyer Dugas et al. Influenza forecasting with Google flu Trends. PLOS ONE 8, 2 (Feb. 2013).Google ScholarGoogle ScholarCross RefCross Ref
  49. Vanja M. Dukic, Michael Z. David, and Diane S. Lauderdale. Internet queries and methicillin-resistant Staphylococcus aureus surveillance. Emerging Infectious Diseases 17, 6 (June 2011).Google ScholarGoogle ScholarCross RefCross Ref
  50. Michael Edelstein et al. Detecting the norovirus season in Sweden using search engine data -- Meeting the needs of hospital infection control teams. PLOS ONE 9, 6 (June 2014).Google ScholarGoogle Scholar
  51. Johannes C. Eichstaedt et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science 26, 2 (Feb. 2015).Google ScholarGoogle ScholarCross RefCross Ref
  52. Andreas Ekström et al. Forecasting emergency department visits using internet data. Annals of Emergency Medicine 65, 4 (April 2015).Google ScholarGoogle Scholar
  53. Gunther Eysenbach. Infodemiology: Tracking flu-related searches on the web for syndromic surveillance. AMIA Annual Symposium 2006 (2006). http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC1839505/Google ScholarGoogle Scholar
  54. Geoffrey Fairchild et al. 2015. Eliciting disease data from Wikipedia articles. In Weblogs and Social Media (ICWSM) Workshops. http://www.aaai.org/ocs/ index.php/ICWSM/ICWSM15/paper/view/10630Google ScholarGoogle Scholar
  55. Clark C. Freifeld et al. HealthMap: Global infectious disease monitoring through automated classification and visualization of internet media reports. Journal of the American Medical Informatics Association 15, 2 (Jan. 2008).Google ScholarGoogle ScholarCross RefCross Ref
  56. Thomas R. Frieden. A framework for public health action: The health impact pyramid. American Journal of Public Health 100, 4 (April 2010).Google ScholarGoogle ScholarCross RefCross Ref
  57. Nicholas Generous et al. Global disease monitoring and forecasting with Wikipedia. PLOS Computational Biology 10, 11 (Nov. 2014).Google ScholarGoogle Scholar
  58. Francesco Gesualdo et al. Can Twitter be a source of information on allergy? Correlation of pollen counts with tweets reporting symptoms of allergic rhinoconjunctivitis and names of antihistamine drugs. PLOS ONE 10, 7 (July 2015).Google ScholarGoogle Scholar
  59. Jeremy Ginsberg et al. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (Nov. 2008).Google ScholarGoogle Scholar
  60. Steven Gittelman et al. A new source of data for public health surveillance: Facebook likes. Journal of Medical Internet Research 17, 4 (April 2015).Google ScholarGoogle ScholarCross RefCross Ref
  61. Sharad Goel et al. Predicting consumer behavior with Web search. PNAS 107, 41 (Oct. 2010).Google ScholarGoogle Scholar
  62. Janaína Gomide et al. 2011. Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. In Web Science Conference (WebSci). http://www.websci11.org/fileadmin/websci/ Papers/92_paper.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yuzhou Gu et al. Early detection of an epidemic erythromelalgia outbreak using Baidu search data. Scientific Reports 5 (July 2015).Google ScholarGoogle Scholar
  64. Akihito Hagihara, Shogo Miyazaki, and Takeru Abe. Internet suicide searches and the incidence of suicide in young people in Japan. European Archives of Psychiatry and Clinical Neuroscience 262, 1 (Feb. 2012).Google ScholarGoogle ScholarCross RefCross Ref
  65. Francis H. Harlow and Jacob E. Fromm. Computer experiments in fluid dynamics. Scientific American 212, 3 (March 1965).Google ScholarGoogle ScholarCross RefCross Ref
  66. Miguel Helft. Google uses web searches to track fluids spread. The New York Times (Nov. 2008). http://www.nytimes.com/2008/11/12/ technology/internet/12flu.htmlGoogle ScholarGoogle Scholar
  67. Kyle S. Hickmann et al. Forecasting the 2013-2014 influenza season using Wikipedia. PLOS Computational Biology 11, 5 (May 2015).Google ScholarGoogle Scholar
  68. Hideo Hirose and Liangliang Wang. 2012. Prediction of infectious disease spread using Twitter: A case of influenza. In Parallel Architectures, Algorithms and Programming (PAAP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (Feb. 1970).Google ScholarGoogle ScholarCross RefCross Ref
  70. Martin Rudi Holaker and Eirik Emanuelsen. 2013. Event detection using Wikipedia. Master's thesis. Institutt for Datateknikk og Informasjonsvitenskap. http://www.diva-portal.org/smash/record.jsf?pid=diva2:655606Google ScholarGoogle Scholar
  71. Anette Hulth et al. Eye-opening approach to norovirus surveillance. Emerging Infectious Diseases 16, 8 (Aug. 2010).Google ScholarGoogle Scholar
  72. Anette Hulth and Gustaf Rydevik. Web query-based surveillance in Sweden during the influenza A(H1N1)2009 pandemic, April 2009 to February 2010. Euro Surveillance 16, 18 (2011).Google ScholarGoogle Scholar
  73. Anette Hulth, Gustaf Rydevik, and Annika Linde. Web queries as a source for syndromic surveillance. PLOS ONE 4, 2 (Feb. 2009).Google ScholarGoogle ScholarCross RefCross Ref
  74. Robert Koch Institute. SurvStat@RKI 2.0. (2016). https://survstat.rki.de/Content/Query/ Create.aspxGoogle ScholarGoogle Scholar
  75. Instituto Nacional de Salud. Boletín epidemiológico. (2015). http://www.ins.gov.co/boletinepidemiologico/Paginas/default.aspxGoogle ScholarGoogle Scholar
  76. Molly E. Ireland et al. Action tweets linked to reduced county-level HIV prevalence in the USA: Online messages and structural determinants. AIDS and Behavior (Dec. 2015).Google ScholarGoogle Scholar
  77. Bao Jia-xing et al. 2013. Gonorrhea incidence forecasting research based on Baidu search data. In Management Science and Engineering (ICMSE).Google ScholarGoogle Scholar
  78. Amy K. Johnson and Supriya D. Mehta. A comparison of internet search trends and sexually transmitted infection rates using Google Trends. Sexually Transmitted Diseases 41, 1 (Jan. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  79. Heather A Johnson et al. Analysis of Web access logs for surveillance of influenza. Studies in Health Technology and Informatics 107, 2 (2004). http:// www.ncbi.nlm.nih.gov/pubmed/15361003Google ScholarGoogle Scholar
  80. Mirko Kämpf et al. The detection of emerging trends using Wikipedia traffic data and context networks. PLOS ONE 10, 12 (Dec. 2015).Google ScholarGoogle Scholar
  81. Min Kang et al. Using Google Trends for influenza surveillance in South China. PLOS ONE 8, 1 (Jan. 2013).Google ScholarGoogle Scholar
  82. M.-G. Kang et al. Google unveils a glimpse of allergic rhinitis in the real world. Allergy 70, 1 (Jan. 2015).Google ScholarGoogle ScholarCross RefCross Ref
  83. Asad Ullah Rafiq Khan, Mohammad Badruddin Khan, and Khalid Mahmood. 2015. Cloud service for assessment of news? popularity in internet based on Google and Wikipedia indicators. In National Symposium on Information Technology: Towards New Smart World (NSITNSW).Google ScholarGoogle ScholarCross RefCross Ref
  84. Eui-Ki Kim et al. Use of Hangeul Twitter to track and predict human influenza infection. PLOS ONE 8, 7 (July 2013).Google ScholarGoogle ScholarCross RefCross Ref
  85. Kwang Deok Kim and Liaquat Hossain. 2014. Towards early detection of influenza epidemics by using social media analytics. In DSS 2.0 -- Supporting Decision Making with New Technologies. Vol. 261.Google ScholarGoogle Scholar
  86. Nicholas E. Kman and Daniel J. Bachmann. Biosurveillance: a review and update. Advances in Preventive Medicine 2012 (Jan. 2012).Google ScholarGoogle Scholar
  87. Volker König and Ralph Mösges. A model for the determination of pollen count using Google search queries for patients suffering from allergic rhinitis. Journal of Allergy 2014 (June 2014).Google ScholarGoogle ScholarCross RefCross Ref
  88. Natalie Kupferberg and Bridget McCrate Protus. Accuracy and completeness of drug information in Wikipedia: An assessment. Journal of the Medical Library Association 99, 4 (Oct. 2011).Google ScholarGoogle ScholarCross RefCross Ref
  89. Alex Lamb, Michael J. Paul, and Mark Dredze. 2013. Separating fact from fear: Tracking flu infections on Twitter. In Human Language Technologies (NAACL-HLT). http://www.aclweb.org/anthology/N/N13/N131097.pdfGoogle ScholarGoogle Scholar
  90. Vasileios Lampos et al. Advances in nowcasting influenza-like illness rates using search query logs. Scientific Reports 5 (Aug. 2015).Google ScholarGoogle Scholar
  91. Vasileios Lampos et al. Assessing the impact of a health intervention via user-generated Internet content. Data Mining and Knowledge Discovery 29, 5 (July 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Vasileios Lampos and Nello Cristianini. 2010. Tracking the flu pandemic by monitoring the social web. In Cognitive Information Processing (CIP).Google ScholarGoogle Scholar
  93. Vasileios Lampos and Nello Cristianini. Nowcasting events from the social web with statistical learning. Transactions on Intelligent Systems and Technology 3, 4 (Sept. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Michaël R. Laurent and Tim J. Vickers. Seeking Health Information Online: Does Wikipedia Matter? Journal of the American Medical Informatics Association 16, 4 (July 2009).Google ScholarGoogle ScholarCross RefCross Ref
  95. David Lazer et al. The parable of Google flu: Traps in big data analysis. Science 343, 14 March (2014).Google ScholarGoogle Scholar
  96. Andreas Leithner et al. Wikipedia and osteosarcoma: A trustworthy patients' information? Journal of the American Medical Informatics Association 17, 4 (Jan. 2010).Google ScholarGoogle Scholar
  97. Shengli Li and Xichuan Zhou. Research of the correlation between the H1N1 morbidity data and Google Trends in Egypt. arXiv:1511.05300 {cs.SI} (Nov. 2015). http://arxiv.org/abs/1511.05300Google ScholarGoogle Scholar
  98. Johan Lindh et al. Head lice surveillance on a deregulated OTC-sales market: A study using web query data. PLOS ONE 7, 11 (Nov. 2012).Google ScholarGoogle Scholar
  99. Ruoqian Liu et al. 2014. Enhancing financial decision-making using social behavior modeling. In Social Network Mining and Analysis (SNAKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Kevin Lutsky, Joseph Bernstein, and Pedro Beredjiklian. Quality of information on the internet about carpal tunnel syndrome: An update. Orthopedics 36, 8 (2013). http://www.healio.com/orthopedics/ journals/ortho/%7Bf97c8407--7483--4d26--9aac2b860b0e6d2c%7D/quality-of-information-onthe-internet-about-carpal-tunnel-syndromean-updateGoogle ScholarGoogle ScholarCross RefCross Ref
  101. T. Ma et al. Syndromic surveillance of influenza activity in Sweden: an evaluation of three tools. Epidemiology & Infection 143, 11 (Aug. 2015).Google ScholarGoogle Scholar
  102. Douglas Martin. Jack Twyman, N.B.A. star, dies at 78. The New York Times (May 2012). http://www.nytimes.com/2012/06/01/sports/ basketball/jack-twyman-nba-star-dies-at78.htmlGoogle ScholarGoogle Scholar
  103. Leah J. Martin, B. E. Lee, and Yutaka Yasui. Google flu Trends in Canada: A comparison of digital disease surveillance data with physician consultations and respiratory virus surveillance data, 2010-2014. Epidemiology & Infection 144, 02 (Jan. 2016).Google ScholarGoogle Scholar
  104. Leah J. Martin, Biying Xu, and Yutaka Yasui. Improving Google flu Trends estimates for the USA through transformation. PLOS ONE 9, 12 (Dec. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  105. David J. McIver and John S. Brownstein. Wikipedia usage estimates prevalence of influenza-like illness in the USA in near real-time. PLOS Computational Biology 10, 4 (April 2014).Google ScholarGoogle ScholarCross RefCross Ref
  106. Wes McKinney. 2010. Data structures for statistical computing in Python. In Python in Science (SCIPY), Vol. 445. http://conference.scipy.org/ proceedings/scipy2010/pdfs/mckinney.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  107. Anthony J. McMichael. Globalization, climate change, and human health. New England Journal of Medicine 368, 14 (April 2013).Google ScholarGoogle ScholarCross RefCross Ref
  108. Márton Mestyán, Taha Yasseri, and János Kertész. Early prediction of movie box office success based on Wikipedia activity big data. PLOS ONE 8, 8 (Aug. 2013).Google ScholarGoogle ScholarCross RefCross Ref
  109. Gabriel J. Milinovich et al. Using internet search queries for infectious disease surveillance: Screening diseases for suitability. BMC Infectious Diseases 14 (2014).Google ScholarGoogle Scholar
  110. David Milne and Ian H. Witten. An open-source toolkit for mining Wikipedia. Artificial Intelligence 194 (Jan. 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Ministry of Health Israel. Weekly epidemiological reports. (2015). http://www.health.gov.il/ UnitsOffice/HD/PH/epidemiology/Pages/ epidemiology_report.aspxGoogle ScholarGoogle Scholar
  112. Susan M. Mniszewski et al. 2014. Understanding the impact of face mask usage through epidemic simulation of large social networks. In Theories and Simulations of Complex Social Systems, Vahid Dabbaghian and Vijay Kumar Mago (Eds.). Number 52. http://link.springer.com/chapter/10.1007/ 978--3--642--39149--1_8Google ScholarGoogle Scholar
  113. Helen Susannah Moat et al. Quantifying Wikipedia usage patterns before stock market moves. Scientific Reports 3 (May 2013).Google ScholarGoogle Scholar
  114. Helen Susannah Moat et al. 2014. Anticipating stock market movements with Google and Wikipedia. In Nonlinear Phenomena in Complex Systems: From Nano to Macro Scale, Davron Matrasulov and H. Eugene Stanley (Eds.).Google ScholarGoogle Scholar
  115. Ruchit Nagar et al. A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives. Journal of Medical Internet Research 16, 10 (Oct. 2014).Google ScholarGoogle Scholar
  116. Anna C. Nagel et al. The complex relationship of realspace events and messages in cyberspace: Case study of influenza and pertussis using tweets. Journal of Medical Internet Research 15, 10 (Oct. 2013).Google ScholarGoogle ScholarCross RefCross Ref
  117. N.J.D. Nagelkerke. A note on a general definition of the coefficient of determination. Biometrika 78, 3 (1991).Google ScholarGoogle ScholarCross RefCross Ref
  118. Kok W. Ng. 2014. The use of Twitter to predict the level of influenza activity in the USA. M.S. Naval Postgraduate School. http://oai.dtic.mil/oai/ oai?verb=getRecord&metadataPrefix=html& identifier=ADA620696Google ScholarGoogle Scholar
  119. Alex J. Ocampo, Rumi Chunara, and John S. Brownstein. Using search queries for malaria surveillance, Thailand. Malaria Journal 12, 1 (Nov. 2013).Google ScholarGoogle ScholarCross RefCross Ref
  120. Donald R. Olson et al. Reassessing Google flu Trends data for detection of seasonal and pandemic influenza: A comparative epidemiological study at three geographic scales. PLOS Computational Biology 9, 10 (Oct. 2013).Google ScholarGoogle ScholarCross RefCross Ref
  121. Miles Osborne et al. 2012. Bieber no more: First story detection using Twitter and Wikipedia. In SIGIR Workshop on Time-aware Information Access (TAIA). http://www.dcs.gla.ac.uk/~craigm/ publications/osborneTAIA2012.pdfGoogle ScholarGoogle Scholar
  122. John Paparrizos, Ryen W. White, and Eric Horvitz. Screening for pancreatic adenocarcinoma using signals from web search logs: Feasibility study and results. Journal of Oncology Practice (June 2016).Google ScholarGoogle Scholar
  123. Michael J. Paul and Mark Dredze. 2011. You are what you tweet: Analyzing Twitter for public health. In Weblogs and Social Media (ICWSM).Google ScholarGoogle Scholar
  124. Michael J. Paul, Mark Dredze, and David Broniatowski. Twitter improves influenza forecasting. PLOS Currents (Oct. 2014).Google ScholarGoogle Scholar
  125. Fabian Pedregosa et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011). http://jmlr.org/papers/v12/pedregosa11a.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Camille Pelat et al. More diseases tracked by using Google Trends. Emerging Infectious Diseases 15, 8 (Aug. 2009).Google ScholarGoogle ScholarCross RefCross Ref
  127. Geng Peng and Jiyuan Wang. 2014. Detecting syphilis amount in China based on Baidu query data. In Soft Computing in Information Communication Technology (SCICT 2014).Google ScholarGoogle ScholarCross RefCross Ref
  128. Fernando Pérez and Brian E. Granger. IPython: A system for interactive scientific computing. Computing in Science & Engineering 9, 3 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Lyle R. Petersen et al. Zika virus. New England Journal of Medicine 374, 16 (April 2016).Google ScholarGoogle ScholarCross RefCross Ref
  130. David T. Plante and David G. Ingram. Seasonal trends in tinnitus symptomatology: Evidence from Internet search engine query data. European Archives of Oto-Rhino-Laryngology 272, 10 (Sept. 2014).Google ScholarGoogle Scholar
  131. Philip M. Polgreen et al. Using internet searches for influenza surveillance. Clinical Infectious Diseases 47, 11 (Jan. 2008).Google ScholarGoogle ScholarCross RefCross Ref
  132. Tobias Preis and Helen Susannah Moat. Adaptive nowcasting of influenza outbreaks using Google searches. Royal Society Open Science 1, 2 (Oct. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  133. Reid Priedhorsky et al. 2007. Creating, destroying, and restoring value in Wikipedia. In Supporting Group Work (GROUP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Reid Priedhorsky, Geoffrey Fairchild, and Sara Y. Del Valle. Research:Geo-aggregation of Wikipedia pageviews. (2015). https://meta.wikimedia.org/ wiki/Research:Geoaggregation_of_Wikipedia_pageviewsGoogle ScholarGoogle Scholar
  135. Malolan S. Rajagopalan et al. Patient-oriented cancer information on the internet: A comparison of Wikipedia and a professionally maintained database. Journal of Oncology Practice 7, 5 (Jan. 2011).Google ScholarGoogle ScholarCross RefCross Ref
  136. Sudha Ram et al. Predicting asthma-related emergency department visits using big data. IEEE Journal of Biomedical and Health Informatics 19, 4 (July 2015).Google ScholarGoogle ScholarCross RefCross Ref
  137. Ronald E. Rice. Influences, usage, and outcomes of Internet health information searching: Multivariate results from the Pew surveys. International Journal of Medical Informatics 75, 1 (Jan. 2006).Google ScholarGoogle ScholarCross RefCross Ref
  138. Joshua Ritterman, Miles Osborne, and Ewan Klein. 2009. Using prediction markets and Twitter to predict a swine flu pandemic. In Workshop on Mining Social Media. http://homepages.inf.ed.ac.uk/miles/ papers/swine09.pdfGoogle ScholarGoogle Scholar
  139. Caitlin M. Rivers et al. Modeling the impact of interventions on an epidemic of Ebola in Sierra Leone and Liberia. PLOS Currents (2014).Google ScholarGoogle Scholar
  140. Ankit Rohatgi. WebPlotDigitizer. (Oct. 2015). http://arohatgi.info/WebPlotDigitizerGoogle ScholarGoogle Scholar
  141. Mauricio Santillana et al. Using clinicians' search query data to monitor influenza epidemics. Clinical Infectious Diseases 59, 10 (Nov. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  142. Mauricio Santillana et al. What can digital disease detection learn from (an external revision to) Google flu Trends? American Journal of Preventive Medicine 47, 3 (Sept. 2014).Google ScholarGoogle Scholar
  143. Mauricio Santillana et al. Combining search, social media, and traditional data sources to improve influenza surveillance. PLOS Computational Biology 11, 10 (Oct. 2015).Google ScholarGoogle Scholar
  144. Sercan Sarigul and Huaxia Rui. 2014. Nowcasting obesity in the U.S. using Google search volume data. In AAEA/EAAE/CAES Joint Symposium: Social Networks, Social Media and the Economics of Food. http://econpapers.repec.org/paper/ agsaajs14/166113.htmGoogle ScholarGoogle Scholar
  145. Shilad Sen et al. 2014. WikiBrain: Democratizing computation on Wikipedia. In OpenSym. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Dong-Woo Seo et al. Cumulative query method for influenza surveillance using search engine data. Journal of Medical Internet Research 16, 12 (Dec. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  147. Jeffrey Shaman and Alicia Karspeck. Forecasting seasonal outbreaks of influenza. Proceedings of the National Academy of Sciences 109, 50 (Nov. 2012).Google ScholarGoogle ScholarCross RefCross Ref
  148. Alessio Signorini. 2014. Use of social media to monitor and predict outbreaks and public opinion on health topics. Ph.D. University of Iowa. http://ir.uiowa.edu/etd/1503/Google ScholarGoogle Scholar
  149. Alessio Signorini, Alberto Maria Segre, and Philip M. Polgreen. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLOS ONE 6, 5 (May 2011).Google ScholarGoogle ScholarCross RefCross Ref
  150. Amit Singhal. Introducing the Knowledge Graph: Things, not strings. (May 2012). https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-thingsnot.htmlGoogle ScholarGoogle Scholar
  151. Giovanni Stilo et al. 2014. Predicting flu epidemics using Twitter and historical data. In Brain Informatics and Health, Dominik Ślezak et al. (Eds.). Number 8609.Google ScholarGoogle Scholar
  152. Michael Strube and Simone Paolo Ponzetto. 2006. WikiRelate! Computing semantic relatedness using Wikipedia. In AAAI, Vol. 6. http://www.aaai.org/Papers/AAAI/2006/AAAI06--223.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Yla Tausczik et al. Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits. Health Communication 27, 2 (2012).Google ScholarGoogle Scholar
  154. flu Trends Team. The next chapter for flu Trends. (Aug. 2015). http://googleresearch.blogspot.com/2015/08/the-next-chapter-for-flu-trends.htmlGoogle ScholarGoogle Scholar
  155. Marijn ten Thij et al. Modeling page-view dynamics on Wikipedia. arXiv:1212.5943 {physics} (Dec. 2012). http://arxiv.org/abs/1212.5943Google ScholarGoogle Scholar
  156. Garry R. Thomas et al. An evaluation of Wikipedia as a resource for patient education in nephrology. Seminars in Dialysis 26, 2 (2013).Google ScholarGoogle ScholarCross RefCross Ref
  157. L. H. Thompson et al. Emergency department and 'Google flu Trends' data as syndromic surveillance indicators for seasonal influenza. Epidemiology & Infection 142, 11 (Nov. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  158. Anna R. Thorner et al. Correlation between UpToDate searches and reported cases of Middle East respiratory syndrome during outbreaks in Saudi Arabia. Open Forum Infectious Diseases 3, 1 (Jan. 2016). http:/Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                CSCW '17: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
                February 2017
                2556 pages
                ISBN:9781450343350
                DOI:10.1145/2998181

                Copyright © 2017 Owner/Author

                Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 25 February 2017

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                CSCW '17 Paper Acceptance Rate183of530submissions,35%Overall Acceptance Rate2,235of8,521submissions,26%

                Upcoming Conference

                CSCW '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader