ABSTRACT
Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of a specific disease early enough to stop or control the outbreak. However, construction of such line lists requires considerable human supervision and therefore, difficult to generate in real-time. In this paper, we motivate Guided Epidemiological Line List (GELL), the first tool for building automated line lists (in near real-time) from open source reports of emerging disease outbreaks. Specifically, we focus on deriving epidemiological characteristics of an emerging disease and the affected population from reports of illness. GELL uses distributed vector representations (ala word2vec) to discover a set of indicators for each line list feature. This discovery of indicators is followed by the use of dependency parsing based techniques for final extraction in tabular form. We evaluate the performance of GELL against a human annotated line list provided by HealthMap corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that GELL extracts line list features with increased accuracy compared to a baseline method. We further show how these automatically extracted line list features can be used for making epidemiological inferences, such as inferring demographics and symptoms-to-hospitalization period of affected individuals.
- M. Ballesteros, A. Díaz, V. Francisco, P. Gervás, J. C. De Albornoz, and L. Plaza. 2012. UCM-2: a rule-based approach to infer the scope of negation via dependency parsing Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation. Association for Computational Linguistics, 288--293.Google Scholar
- R. C. Bunescu and R. J. Mooney 2005. A shortest path dependency kernel for relation extraction Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 724--731.Google Scholar
- A. Diaz, M. Ballesteros, J. Carrillo-de Albornoz, and L. Plaza. 2012. UCM at TREC-2012: Does negation influence the retrieval of medical reports? Technical Report. DTIC Document.Google Scholar
- Clark C Freifeld, Kenneth D Mandl, Ben Y Reis, and John S Brownstein 2008. HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports. Journal of the American Medical Informatics Association, Vol. 15, 2 (2008), 150--157.Google ScholarCross Ref
- S. Ghosh, P. Chakraborty, E. Cohn, J. S. Brownstein, and N. Ramakrishnan 2016. Characterizing Diseases from Unstructured Text: A Vocabulary Driven Word2vec Approach Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 1129--1138.Google Scholar
- M. Honnibal and M. Johnson 2015. An Improved Non-monotonic Transition System for Dependency Parsing Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 1373--1378. https://aclweb.org/anthology/D/D15/D15-1162 Google ScholarCross Ref
- E. HY. Lau, J. Zheng, T. K. Tsang, Q. Liao, B. Lewis, J. S. Brownstein, S. Sanders, J. Y. Wong, S. R. Mekaru, C. Rivers, et almbox. 2014. Accuracy of epidemiological inferences based on publicly available information: retrospective comparative analysis of line lists of human cases infected with influenza A (H7N9) in China. BMC medicine, Vol. 12, 1 (2014), 88. Google ScholarCross Ref
- Q. V. Le and T. Mikolov 2014. Distributed Representations of Sentences and Documents. ICML, Vol. Vol. 14. 1188--1196.Google ScholarDigital Library
- O. Levy and Y. Goldberg 2014natexlaba. Dependency-Based Word Embeddings.. In ACL (2). 302--308.Google Scholar
- O. Levy and Y. Goldberg 2014. Dependency-Based Word Embeddings. In Proceedings of the 52nd Annual Meeting of the ACL. 302--308. showURL%http://aclweb.org/anthology/P/P14/P14-2050.pdfGoogle ScholarCross Ref
- O. Levy and Y. Goldberg 2014natexlabc. Linguistic Regularities in Sparse and Explicit Word Representations Proceedings of the Eighteenth Conference on CoNLL. 171--180. http://aclweb.org/anthology/W/W14/W14-1618.pdfGoogle Scholar
- O. Levy, Y. Goldberg, and I. Dagan 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. TACL Vol. 3 (2015), 211--225. https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/570Google ScholarCross Ref
- M. S. Majumder, C. Rivers, E. Lofgren, and D. Fisman. 2014. Estimation of MERS-coronavirus reproductive number and case fatality rate for the spring 2014 Saudi Arabia outbreak: insights from publicly available data. PLOS Currents Outbreaks (2014).Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013natexlaba. Efficient Estimation of Word Representations in Vector Space. CoRR Vol. abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean 2013. Distributed Representations of Words and Phrases and their Compositionality 26th Annual Conference on Neural Information Processing Systems. 3111--3119.Google Scholar
- T. Mikolov, W. Yih, and G. Zweig 2013. Linguistic Regularities in Continuous Space Word Representations Human Language Technologies: Conference of the NAACL. 746--751. http://aclweb.org/anthology/N/N13/N13-1090.pdfGoogle Scholar
- S. Muthiah, B. Huang, J. Arredondo, D. Mares, L. Getoor, G. Katz, and N. Ramakrishnan 2015. Planned Protest Modeling in News and Social Media. AAAI. 3920--3927.Google Scholar
- Y. Ou and J. Patrick. 2015. Automatic negation detection in narrative pathology reports. Artificial intelligence in medicine Vol. 64, 1 (2015), 41--50. Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research Vol. 12 (2011), 2825--2830.Google ScholarDigital Library
- N. Ramakrishnan, P. Butler, S. Muthiah, N. Self, R. Khandpur, P. Saraf, W. Wang, J. Cadena, A. Vullikanti, G. Korkmaz, et almbox. 2014. 'Beating the news' with EMBERS: Forecasting civil unrest using open source indicators Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1799--1808.Google Scholar
- S. Sohn, S. Wu, and C. G. Chute 2012. Dependency parser-based negation detection in clinical narratives. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science Vol. 2012 (2012), 1--8.Google Scholar
- WHO 2016. Coronavirus infections: Disease Outbreak News. (2016). http://www.who.int/csr/don/archive/disease/coronavirus_infections/en/Google Scholar
- F. Wu and D. S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 118--127.Google ScholarDigital Library
Index Terms
GELL: Automatic Extraction of Epidemiological Line Lists from Open Sources
Recommendations
Finding best evidence for evidence-based best practice recommendations in health care: the initial decision support system design
A major problem for Canadian health organizations is finding best evidence for evidence-based best practice recommendations. Medications are not always effectively used and misuse may harm patients. Drugs are the fastest-growing element of Canadian ...
Statistical parsing of varieties of clinical Finnish
Objectives: In this paper, we study the development and domain-adaptation of statistical syntactic parsers for three different clinical domains in Finnish. Methods and materials: The materials include text from daily nursing notes written by nurses in ...
Prediction of Similarities Among Rheumatic Diseases
We introduce a method for extracting hidden patterns seen in rheumatic diseases by using articles from the widely used biomedical database MEDLINE. Rheumatic diseases affect hundreds of millions of people worldwide and lead to substantial loss of ...
Comments