Article

Approaches to text mining for clinical medical records

Authors:
Xiaohua Zhou

Drexel University, Philadelphia, PA

Drexel University, Philadelphia, PA
View Profile

,
Hyoil Han

Drexel University, Philadelphia, PA

Drexel University, Philadelphia, PA
View Profile

,
Isaac Chankai

Drexel University, Philadelphia, PA

Drexel University, Philadelphia, PA
View Profile

,
Ann Prestrud

Drexel University, Philadelphia, PA

Drexel University, Philadelphia, PA
View Profile

,
Ari Brooks

Drexel University, Philadelphia, PA

Drexel University, Philadelphia, PA
View Profile

SAC '06: Proceedings of the 2006 ACM symposium on Applied computingApril 2006Pages 235–239https://doi.org/10.1145/1141277.1141330

Published:23 April 2006Publication History

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

Pages 235–239

ABSTRACT

Clinical medical records contain a wealth of information, largely in free-text form. Means to extract structured information from free-text records is an important research endeavor. In this paper, we describe a MEDical Information Extraction (MedIE) system that extracts and mines a variety of patient information with breast complaints from free-text clinical records. MedIE is a part of medical text mining project being conducted in Drexel University. Three approaches are proposed to solve different IE tasks and very good performance (precision and recall) was achieved. A graph-based approach which uses the parsing result of link-grammar parser was invented for relation extraction; high accuracy was achieved. A simple but efficient ontology-based approach was adopted to extract medical terms of interest. Finally, an NLP-based feature extraction method coupled with an ID3-based decision tree was used to perform text classification.

References

Cunningham, H., "GATE, A General Architecture for Text Engineering", Computers and the Humanities, 2002, Vol. 36, pp. 223--254Google ScholarCross Ref
Cunningham, H., Maynard, D., and Tablan., V., "JAPE: a Java Annotation Patterns Engine (Second Edition)", Technical report CS-00-10, University of Sheffield, Department of Computer Science, 2000.Google Scholar
Dimitrov, M., Bontcheva, K., Cunningham, H., and Maynard, D., "A Light-weight Approach to Coreference Resolution for Named Entities in Text", Proceedings of the Fourth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon, 2002.Google Scholar
Ding, J., Berleant, D., Xu, J., and Fulmer, A. W., "Extracting Biochemical Interactions from MEDLINE Using a Link Grammar Parser", In the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03), 2003. Google ScholarDigital Library
Gaizauskas, R., Hepple, M., Davis, N., Guo, Y., Harkema, H, Roberts, A., and Roberts, I., "AMBIT: Acquiring Medical and Biological Information from Text", ISMB/ECCB, Poster, 2004.Google Scholar
Kim, J. T. and Moldovan, D. I., "Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction", IEEE Transactions on Knowledge and Data Engineering, Volume 7, Issue 5, 1995, pp. 713--724. Google ScholarDigital Library
Kuhn, R. and Mori, R., "Application of Semantic Classification Trees to Natural Language Understanding", IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, Vol. 17, No. 5. Google ScholarDigital Library
Lehnert, W., Soderland, S., Aronow, D., Feng, F., and Shmueli, A., "Inductive Text Classification for Medical Applications", Journal for Experimental and Theoretical Artificial Intelligence, 1994, 7(1), pp. 49--80.Google ScholarCross Ref
Madhyastha, H. V., Balakrishnan, N., and Ramakrishnan, K. R., "Event Information Extraction Using Link Grammar", 13th International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management (RIDE'03), 2003.Google ScholarCross Ref
Miller, G. et al, "WordNet: an On-line Lexical Database", International Journal of Lexicography, 1990, pp. 235--245.Google ScholarCross Ref
Quinlan, J. R., "Induction of Decision Trees", Machine Learning, 1986, No.1, pp. 81--106. Google ScholarCross Ref
Riloff, E., "Automatically Constructing a Dictionary for Information Extraction Tasks", Proceedings of the Eleventh National Conference on Artificial Intelligence, AAAI Press/the MIT Press, 1993, pp. 811--816Google Scholar
Riloff, E. and Lehnert, W., "Information Extraction as a Basis for High-Precision Text Classification", ACM Transactions on Information Systems, 1994, Vol. 12, No. 3, pp. 296--333. Google ScholarDigital Library
Sleator, D. and Temperley D., "Parsing English with a Link Grammar", Third International Workshop on Parsing Technologies, 1993.Google Scholar
Soderland, S., Aronow, D., Fisher, D., Aseltine, J., and Lehnert, W., "Machine Learning of Text Analysis Rules for Clinical Records", CIIR Technical Report, University of Massachusetts Amherst, 1995.Google Scholar
Soderland, S., Fisher, D., Aseltine, J., and Lehnert, W., "CRYSTAL: Inducing a Conceptual Dictionary", Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995, pp. 1314--1319. Google ScholarDigital Library
Soderland, S., "Learning Information Extraction rules for Semi-structured and free text", Machine Learning, Vol. 34, 1998, pp. 233--272. Google ScholarDigital Library
Szolovits, P., "Adding a Medical Lexicon to an English Parser", Proc. AMIA 2003 Annual Symposium, 2003.Google Scholar
Zhou, X., Han, H., Chankai, I., Prestrud, A. A., and Brooks, A. D., "Converting Semi-structured Clinical Medical Records into Information and Knowledge", In the International Workshop on Biomedical Data Engineering in conjunction with the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, April 3-4, 2005. Google ScholarDigital Library

Index Terms

Approaches to text mining for clinical medical records
1. Applied computing
  1. Life and medical sciences
    1. Health care information systems
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Semantic-based exchanger of electronic medical records
MoMM '08: Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia

Considering the importance of the patient's medical information for the caregivers to ensure that patients receive appropriate and safe treatment, especially the emergency room (ER) patients, thus, sharing distributed medical information among ...
Read More
Anonymizing and Sharing Medical Text Records

Health information technology has increased accessibility of health and medical data and benefited medical research and healthcare management. However, there are rising concerns about patient privacy in sharing medical and healthcare data. A large ...
Read More
Fever detection from free-text clinical records for biosurveillance

Automatic detection of cases of febrile illness may have potential for early detection of outbreaks of infectious disease either by identification of anomalous numbers of febrile illness or in concert with other information in diagnosing specific ...
Read More

Reviews

Reviewer: Andrew Brooks

Text mining works. Precision and recall rates for classification (for example, smoking status), attribute extraction (for example, blood pressure), and medical term extraction (for example, past surgical history) for the Java-based MedIE system on 125 free-text clinical records were at least 86 percent and often better. MedIE comprises three main techniques. Classification tasks make use of the ID3 decision tree algorithm. Attribute extraction makes use of a graph-based approach from output produced by the link grammar parser. Medical term extraction makes use of the unified medical language system (UMLS) as the domain ontology. Although it is impossible to convey every detail of a complex system or experiment, the explanations of the three main techniques and experimental results are too brief. The reader is left with many questions unanswered: How much did using part-of-speech patterns reduce the computational complexity of medical term extraction__?__ How were negating words or phrases dealt with__?__ How often was a pattern approach taken when the link grammar parser failed to parse a sentence__?__ How long did MedIE take compared to the human coder that was employed as the oracle__?__ Why are precision and recall clearly distinguished in Table 3 but not in Tables 1 and 2__?__ Surprisingly missing is the final step: that of data mining on the text-mined data itself. Despite this, and the brevity of the explanations, this paper represents a key milestone in text mining and as such is strongly recommended to anyone working in data or text mining. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Conference Chair:
Hisham M. Haddad
Kennesaw State University, Kennesaw, Georgia
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clinical records
information extraction
ontology
relation extraction
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 1,939
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Approaches to text mining for clinical medical records

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantic-based exchanger of electronic medical records

Anonymizing and Sharing Medical Text Records

Fever detection from free-text clinical records for biosurveillance

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Approaches to text mining for clinical medical records

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantic-based exchanger of electronic medical records

Anonymizing and Sharing Medical Text Records

Fever detection from free-text clinical records for biosurveillance

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media