research-article

Free Access

Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

Authors:
Nate Sutton

Arizona State University, Tempe, Arizona

Arizona State University, Tempe, Arizona
View Profile

,
Laura Wojtulewicz

Arizona State University, Tempe, Arizona

Arizona State University, Tempe, Arizona
View Profile

,
Neel Mehta

Arizona State University, Tempe, Arizona

Arizona State University, Tempe, Arizona
View Profile

,
Graciela Gonzalez

Arizona State University, Tempe, Arizona

Arizona State University, Tempe, Arizona
View Profile

Authors Info & Claims

BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language ProcessingJune 2012Pages 214–222

Published:08 June 2012Publication History

BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Pages 214–222

ABSTRACT

Publications that report genotype-drug interaction findings, as well as manually curated databases such as DrugBank and PharmGKB are essential to advancing pharmacogenomics, a relatively new area merging pharmacology and genomic research. Natural language processing (NLP) methods can be very useful for automatically extracting knowledge such as gene-drug interactions, offering researchers immediate access to published findings, and allowing curators a shortcut for their work.

We present a corpus of gene-drug interactions for evaluating and training systems to extract those interactions. The corpus includes 551 sentences that have a mention of a drug and a gene from about 600 journals found to be relevant to pharmacogenomics through an analysis of gene-drug relationships in the PharmGKB knowledgebase.

We evaluated basic approaches to automatic extraction, including gene and drug co-occurrence, co-occurrence plus interaction terms, and a linguistic pattern-based method. The linguistic pattern method had the highest precision (96.61%) but lowest recall (7.30%), for an f-score of 13.57%. Basic co-occurrence yields 68.99% precision, with the addition of an interaction term precision increases slightly (69.60%), though not as much as could be expected. Co-occurrence is a reasonable baseline method, with pattern-based being a promising approach if enough patterns can be generated to address recall. The corpus is available at http://diego.asu.edu/index.php/projects

References

Ahlers, C., Fiszman, M., Demner-Fushman, D., Lang, F.-M., & Rindflesch, T. (2007). Extracting semantic predications from Medline citations for pharmacogenomics. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 209--220.Google Scholar
Bui, Q.-C., Nualláin, B. O., Boucher, C. A., & Sloot, P. M. A. (2010). Extracting causal relations on HIV drug resistance from literature. BMC Bioinformatics, 11, 101. doi: 10.1186/1471-2105-11-101Google ScholarCross Ref
Cheng, D., Knox, C., Young, N., Stothard, P., Damaraju, S., & Wishart, D. S. (2008). PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research, 36(Web Server issue), W399--405. doi: 10.1093/nar/gkn296Google Scholar
Chowdhary, R., Zhang, J., & Liu, J. S. (2009). Bayesian inference of protein-protein interactions from biological literature. Bioinformatics (Oxford, England), 25(12), 1536--1542. doi: 10.1093/bioinformatics/btp245 Google ScholarDigital Library
Coulet, A., Shah, N. H., Garten, Y., Musen, M., & Altman, R. B. (2010). Using text to build semantic networks for pharmacogenomics. Journal of Biomedical Informatics, 43(6), 1009--1019. doi: 10.1016/j.jbi.2010.08.005 Google ScholarDigital Library
Garten, Y., & Altman, R. B. (2009). Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text. BMC Bioinformatics, 10 Suppl 2, S6. doi: 10.1186/1471-2105-10-S2-S6Google ScholarCross Ref
Hakenberg, J., Leaman, R., Vo, N. H., Jonnalagadda, S., Sullivan, R., Miller, C., Tari, L., et al. (2010). Efficient extraction of protein-protein interactions from full-text articles. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM, 7(3), 481--494. doi: 10.1109/TCBB.2010.51 Google ScholarDigital Library
Hewett, M., Oliver, D. E., Rubin, D. L., Easton, K. L., Stuart, J. M., Altman, R. B., & Klein, T. E. (2002). PharmGKB: The Pharmacogenetics Knowledge Base. Nucleic Acids Research, 30(1), 163--165. doi: 10.1093/nar/30.1.163Google ScholarCross Ref
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., & Valencia, A. (2008). Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology, 9 Suppl 2, S4. doi: 10.1186/gb-2008-9-s2-s4Google ScholarCross Ref
Leaman, R., & Gonzalez, G. (2008). BANNER: an executable survey of advances in biomedical named entity recognition. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 652--663.Google Scholar
Luis Tari, Jörg Hakenberg, Graciela Gonzalez, & Baral, C. (2009). Querying parse tree database of medline text to synthesize user-specific biomolecular networks. CiteSeerX. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.8574Google Scholar
Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8, 50. doi: 10.1186/1471-2105-8-50Google ScholarCross Ref
Rebholz-Schuhmann, D., Kirsch, H., Arregui, M., Gaudan, S., Riethoven, M., & Stoehr, P. (2007). EBIMed---text Crunching to Gather Facts for Proteins from Medline. Bioinformatics, 23(2), e237--e244. doi: 10.1093/bioinformatics/btl302 Google ScholarDigital Library
Sconce, E. A., Daly, A. K., Khan, T. I., Wynne, H. A., & Kamali, F. (2006). APOE genotype makes a small contribution to warfarin dose requirements. Pharmacogenetics and Genomics, 16(8), 609--611. doi: 10.1097/01.fpc.0000220567.98089.b5Google ScholarCross Ref
Shepherd, A. J., & Clegg, A. B. (2008). Syntactic pattern matching with GraphSpider and MPL. Proceedings of the Third International Symposium on Semantic Mining in Biomedicine SMBM 2008 Turku Finland, 129--132.Google Scholar
stav. (n.d.).GitHub. Retrieved March 26, 2012, from https://github.com/TsujiiLaboratory/stavGoogle Scholar
Strijbos, J.-W., Martens, R. L., Prins, F. J., & Jochems, W. M. G. (2006). Content analysis: What are they talking about? Computers & Education, 46(1), 29--48. doi: 10.1016/j.compedu.2005.04.002T1.pdf. (n.d.). Retrieved from http://www.lrec-conf.org/proceedings/lrec2008/workshops/T1.pdf Google ScholarDigital Library
The OpenNLP Homepage. (n.d.). Retrieved March 26, 2012, from http://opennlp.sourceforge.net/projects.htmlGoogle Scholar
Wishart, D. S. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34(90001), D668--D672. doi: 10.1093/nar/gkj067Google ScholarCross Ref

Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

Recommendations

Automatic extraction of gene/protein biological functions from biomedical text

Motivation: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of ...
Read More
Text mining biomedical literature for constructing gene regulatory networks
Read More
Gene interaction - An evolutionary biclustering approach

DNA Microarray experiments form a powerful tool for studying gene expression patterns, in large scale. Sharing of the regulatory mechanism among genes, in an organism, is predominantly responsible for their co-expression. Biclustering aims at finding a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
June 2012
257 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 8 June 2012
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate33of92submissions,36%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 150
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Automatic extraction of gene/protein biological functions from biomedical text

Text mining biomedical literature for constructing gene regulatory networks

Gene interaction - An evolutionary biclustering approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Automatic extraction of gene/protein biological functions from biomedical text

Text mining biomedical literature for constructing gene regulatory networks

Gene interaction - An evolutionary biclustering approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media