skip to main content
10.1145/1141277.1141330acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Approaches to text mining for clinical medical records

Published:23 April 2006Publication History

ABSTRACT

Clinical medical records contain a wealth of information, largely in free-text form. Means to extract structured information from free-text records is an important research endeavor. In this paper, we describe a MEDical Information Extraction (MedIE) system that extracts and mines a variety of patient information with breast complaints from free-text clinical records. MedIE is a part of medical text mining project being conducted in Drexel University. Three approaches are proposed to solve different IE tasks and very good performance (precision and recall) was achieved. A graph-based approach which uses the parsing result of link-grammar parser was invented for relation extraction; high accuracy was achieved. A simple but efficient ontology-based approach was adopted to extract medical terms of interest. Finally, an NLP-based feature extraction method coupled with an ID3-based decision tree was used to perform text classification.

References

  1. Cunningham, H., "GATE, A General Architecture for Text Engineering", Computers and the Humanities, 2002, Vol. 36, pp. 223--254Google ScholarGoogle ScholarCross RefCross Ref
  2. Cunningham, H., Maynard, D., and Tablan., V., "JAPE: a Java Annotation Patterns Engine (Second Edition)", Technical report CS-00-10, University of Sheffield, Department of Computer Science, 2000.Google ScholarGoogle Scholar
  3. Dimitrov, M., Bontcheva, K., Cunningham, H., and Maynard, D., "A Light-weight Approach to Coreference Resolution for Named Entities in Text", Proceedings of the Fourth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon, 2002.Google ScholarGoogle Scholar
  4. Ding, J., Berleant, D., Xu, J., and Fulmer, A. W., "Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser", In the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gaizauskas, R., Hepple, M., Davis, N., Guo, Y., Harkema, H, Roberts, A., and Roberts, I., "AMBIT: Acquiring Medical and Biological Information from Text", ISMB/ECCB, Poster, 2004.Google ScholarGoogle Scholar
  6. Kim, J. T. and Moldovan, D. I., "Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction", IEEE Transactions on Knowledge and Data Engineering, Volume 7, Issue 5, 1995, pp. 713--724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kuhn, R. and Mori, R., "Application of Semantic Classification Trees to Natural Language Understanding", IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, Vol. 17, No. 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lehnert, W., Soderland, S., Aronow, D., Feng, F., and Shmueli, A., "Inductive Text Classification for Medical Applications", Journal for Experimental and Theoretical Artificial Intelligence, 1994, 7(1), pp. 49--80.Google ScholarGoogle ScholarCross RefCross Ref
  9. Madhyastha, H. V., Balakrishnan, N., and Ramakrishnan, K. R., "Event Information Extraction Using Link Grammar", 13th International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management (RIDE'03), 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. Miller, G. et al, "WordNet: an On-line Lexical Database", International Journal of Lexicography, 1990, pp. 235--245.Google ScholarGoogle ScholarCross RefCross Ref
  11. Quinlan, J. R., "Induction of Decision Trees", Machine Learning, 1986, No.1, pp. 81--106. Google ScholarGoogle ScholarCross RefCross Ref
  12. Riloff, E., "Automatically Constructing a Dictionary for Information Extraction Tasks", Proceedings of the Eleventh National Conference on Artificial Intelligence, AAAI Press/the MIT Press, 1993, pp. 811--816Google ScholarGoogle Scholar
  13. Riloff, E. and Lehnert, W., "Information Extraction as a Basis for High-Precision Text Classification", ACM Transactions on Information Systems, 1994, Vol. 12, No. 3, pp. 296--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sleator, D. and Temperley D., "Parsing English with a Link Grammar", Third International Workshop on Parsing Technologies, 1993.Google ScholarGoogle Scholar
  15. Soderland, S., Aronow, D., Fisher, D., Aseltine, J., and Lehnert, W., "Machine Learning of Text Analysis Rules for Clinical Records", CIIR Technical Report, University of Massachusetts Amherst, 1995.Google ScholarGoogle Scholar
  16. Soderland, S., Fisher, D., Aseltine, J., and Lehnert, W., "CRYSTAL: Inducing a Conceptual Dictionary", Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995, pp. 1314--1319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Soderland, S., "Learning Information Extraction rules for Semi-structured and free text", Machine Learning, Vol. 34, 1998, pp. 233--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Szolovits, P., "Adding a Medical Lexicon to an English Parser", Proc. AMIA 2003 Annual Symposium, 2003.Google ScholarGoogle Scholar
  19. Zhou, X., Han, H., Chankai, I., Prestrud, A. A., and Brooks, A. D., "Converting Semi-structured Clinical Medical Records into Information and Knowledge", In the International Workshop on Biomedical Data Engineering in conjunction with the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, April 3-4, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approaches to text mining for clinical medical records

        Recommendations

        Reviews

        Andrew Brooks

        Text mining works. Precision and recall rates for classification (for example, smoking status), attribute extraction (for example, blood pressure), and medical term extraction (for example, past surgical history) for the Java-based MedIE system on 125 free-text clinical records were at least 86 percent and often better. MedIE comprises three main techniques. Classification tasks make use of the ID3 decision tree algorithm. Attribute extraction makes use of a graph-based approach from output produced by the link grammar parser. Medical term extraction makes use of the unified medical language system (UMLS) as the domain ontology. Although it is impossible to convey every detail of a complex system or experiment, the explanations of the three main techniques and experimental results are too brief. The reader is left with many questions unanswered: How much did using part-of-speech patterns reduce the computational complexity of medical term extraction__?__ How were negating words or phrases dealt with__?__ How often was a pattern approach taken when the link grammar parser failed to parse a sentence__?__ How long did MedIE take compared to the human coder that was employed as the oracle__?__ Why are precision and recall clearly distinguished in Table 3 but not in Tables 1 and 2__?__ Surprisingly missing is the final step: that of data mining on the text-mined data itself. Despite this, and the brevity of the explanations, this paper represents a key milestone in text mining and as such is strongly recommended to anyone working in data or text mining. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
          April 2006
          1967 pages
          ISBN:1595931082
          DOI:10.1145/1141277

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 April 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,650of6,669submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader