research-article

Free Access

A shared task involving multi-label classification of clinical free text

Authors:
John P. Pestian

University of Cincinnati

University of Cincinnati
View Profile

,
Christopher Brew

Ohio State University

Ohio State University
View Profile

,
Paweł Matykiewicz

University of Cincinnati and Nicolaus Copernicus University, Toruń, Poland

University of Cincinnati and Nicolaus Copernicus University, Toruń, Poland
View Profile

,
D. J. Hovermale

Ohio State University

Ohio State University
View Profile

,
Neil Johnson

University of Cincinnati

University of Cincinnati
View Profile

,
K. Bretonnel Cohen

University of Colorado

University of Colorado
View Profile

,
Włodzisław Duch

Nicolaus Copernicus University, Toruń, Poland

Nicolaus Copernicus University, Toruń, Poland
View Profile

BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language ProcessingJune 2007Pages 97–104

Published:29 June 2007Publication History

BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

Pages 97–104

ABSTRACT

This paper reports on a shared task involving the assignment of ICD-9-CM codes to radiology reports. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the first freely distributable corpus of fully anonymized clinical text. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large and commercially significant set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.

References

{Boutell et al., 2003} Boutell M., Shen X., Luo J. and Brown C. 2003. Multi-label Semantic Scene Classification, Technical Report 813. Department of Computer Science, University of Rochester September.Google Scholar
{Cho et al., 2002} Cho P. S., Taira R. K., and Kangarloo H. 2002 Text boundary detection of medical reports. Proceedings of the Annual Symposium of the American Medical Informatics Association, 998.Google Scholar
{Friedman et al., 2002} Friedman C., Kra P., and Rzhetsky A. 2002. Two biomedical sublanguages: a description based on the theories of Zellig Harris. Journal of Biomedical Informatics, 35:222--235. Google ScholarDigital Library
{Gower and Legendre, 1986} Gower J. C. and Legendre P. 1986. Metric and euclidean properties of dissimilarity coefficient. Journal of Classification, 3:5--48.Google ScholarCross Ref
{Hersh et al., 2004} Hersh W., Bhupatiraju R. T., Ross L., Roberts P., Cohen A. M., and Kraemer D. F. 2004. TREC 2004 Genomics track overview. Proceedings of the 13th Annual Text Retrieval Conference. National Institute of Standards and Technology.Google Scholar
{Hersh et al., 2006} Hersh W., Cohen A. M., Roberts P., and Rekapalli H. K. 2006. TREC 2006 Genomics track overview. Proceedings of the 15th Annual Text Retrieval Conference National Institute of Standards and Technology.Google Scholar
{Hersh et al., 2005} Hersh W., Cohen A. M., Yang J., Bhupatiraju R. T., Roberts P., and Hearst M. 2005. TREC 2005 Genomics track overview. Proceedings of the 14th Annual Text Retrieval Conference. National Institute of Standards and Technology.Google Scholar
{Hirschman and Blaschke, 2006} Hirschman L. and Blaschke C. 2006. Evaluation of text mining in biology. Text mining for biology and biomedicine, Chapter 9. Ananiadou S. and McNaught J., editors. Artech House.Google Scholar
{Hirschman and Sager, 1982} Hirschman L. and Sager S. 1982. Automatic information formatting of a medical sublanguage. Sublanguage: studies of language in restricted semantic domains, Chapter 2. Kittredge R. and Lehrberger J., editors. Walter de Gruyter.Google Scholar
{Hurtado et al., 2001} Hurtado M. P, Swift E. K., and Corrigan J. M. 2001. Crossing the Quality Chasm: A New Health System for the 21st Century. Institute of Medicine, National Academy of Sciences.Google Scholar
{Jackson and Moulinier, 2002} Jackson P. and Moulinier I. 2002. Natural language processing for online applications: text retrieval, extraction, and categorization. John Benjamins Publishing Co.Google Scholar
{Lang, 2007} Lang, D. 2007. CONSULTANT REPORT - Natural Language Processing in the Health Care Industry. Cincinnati Children's Hospital Medical Center, Winter 2007.Google Scholar
{Moisio, 2000} Moisio M. 2000. A Guide to Health Care Insurance Billing. Thomson Delmar Learning, Clifton Park.Google Scholar
{Pestian et al., 2005} Pestian J. P., Itert L., Andersen C. L., and Duch W. 2005. Preparing Clinical Text for Use in Biomedical Research. Journal of Database Management, 17(2):1--12.Google ScholarCross Ref
{Pestian et al., 2004} Pestian J. P., Itert L., and Duch W. 2004. Development of a Pediatric Text-Corpus for Part-of-Speech Tagging. Intelligent Information Processing and Web Mining, Advances in Soft Computing, 219--226 New York, Springer Verlag.Google Scholar
{Sammuelsson and Wiren, 2000} Sammuelsson C. and Wiren M. 2000. Parsing Techniques. Handbook of Natural Language Processing, 59--93. Dale R., Moisl H., Somers H., editors. New York, Marcel Deker.Google Scholar
{Sibanda and Uzuner, 2006} Sibanda T. and Uzuner O. 2006. Role of local context in automatic deidentification of ungrammatical, fragmented text. Proceedings of the Human Language Technology conference of the North American chapter of the Association for Computational Linguistics, 65--73. Google ScholarDigital Library
{Stetson et al., 2002} Stetson P. D., Johnson S. B., Scotch M., and Hripcsak G. 2002. The sublanguage of cross-coverage. Proceedings of the Annual Symposium of the American Medical Informatics Association, 742--746.Google Scholar
{U.S. Health, 2002} U.S. Heath&Human Services. 2002. 45 CFR Parts 160 and 164 Standards for Privacy of Individually Identifiable Health Information Final Rule Federal Register, 67(157):53181--53273.Google Scholar
{Uzuner et al., 2006} Uzuner O., Szolovits P., and Kohane I. 2006. i2b2 workshop on natural language processing challenges for clinical records. Proceedings of the Fall Symposium of the American Medical Informatics Association.Google Scholar
{Walters, 2004} Walters S. J. 2004. Sample size and power estimation for studies with health related quality of life outcomes: a comparison of four methods using the SF-36 Health and Quality of Life Outcomes, 2:26.Google Scholar

A shared task involving multi-label classification of clinical free text
1. Applied computing
  1. Life and medical sciences
  2. Operations research
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Adversarial Multi-task Label Embedding for Text Classification
CIIS '19: Proceedings of the 2019 2nd International Conference on Computational Intelligence and Intelligent Systems

Multi-task learning makes use of the potential correlation among related tasks to perform well in text classification. However, in the most multi-task works, labels are converted to meaningless one-hot vectors, which cause the loss of label semantics ...
Read More
Label-sensitive task grouping by Bayesian nonparametric approach for multi-task multi-label learning
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Multi-label learning is widely applied in many real-world applications, such as image and gene annotation. While most of the existing multilabel learning models focus on the single-task learning problem, there are always some tasks that share some ...
Read More
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
June 2007
241 pages
Conference Chairs:
K. Bretonnel Cohen
University of Colorado School of Medicine
,
Dina Demner-Fushman
Lister Hill National Center for Biomedical Communications
,
Carol Friedman
Columbia Universtity
,
Lynette Hirschman
MITRE
,
John Pestian
Computational Medicine Center, University of Cincinnati, Cincinnati Children's Hospital Medical Center
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 29 June 2007
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate33of92submissions,36%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 1,832
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A shared task involving multi-label classification of clinical free text

BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

ABSTRACT

References

Cited By

Recommendations

Adversarial Multi-task Label Embedding for Text Classification

Label-sensitive task grouping by Bayesian nonparametric approach for multi-task multi-label learning

Semi-supervised multi-label classification using incomplete label information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A shared task involving multi-label classification of clinical free text

BioNLP '07: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

ABSTRACT

References

Cited By

Recommendations

Adversarial Multi-task Label Embedding for Text Classification

Label-sensitive task grouping by Bayesian nonparametric approach for multi-task multi-label learning

Semi-supervised multi-label classification using incomplete label information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media