skip to main content
10.5555/2391123.2391152dlproceedingsArticle/Chapter ViewAbstractPublication PagesbionlpConference Proceedingsconference-collections
research-article
Free Access

Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

Published:08 June 2012Publication History

ABSTRACT

Publications that report genotype-drug interaction findings, as well as manually curated databases such as DrugBank and PharmGKB are essential to advancing pharmacogenomics, a relatively new area merging pharmacology and genomic research. Natural language processing (NLP) methods can be very useful for automatically extracting knowledge such as gene-drug interactions, offering researchers immediate access to published findings, and allowing curators a shortcut for their work.

We present a corpus of gene-drug interactions for evaluating and training systems to extract those interactions. The corpus includes 551 sentences that have a mention of a drug and a gene from about 600 journals found to be relevant to pharmacogenomics through an analysis of gene-drug relationships in the PharmGKB knowledgebase.

We evaluated basic approaches to automatic extraction, including gene and drug co-occurrence, co-occurrence plus interaction terms, and a linguistic pattern-based method. The linguistic pattern method had the highest precision (96.61%) but lowest recall (7.30%), for an f-score of 13.57%. Basic co-occurrence yields 68.99% precision, with the addition of an interaction term precision increases slightly (69.60%), though not as much as could be expected. Co-occurrence is a reasonable baseline method, with pattern-based being a promising approach if enough patterns can be generated to address recall. The corpus is available at http://diego.asu.edu/index.php/projects

References

  1. Ahlers, C., Fiszman, M., Demner-Fushman, D., Lang, F.-M., & Rindflesch, T. (2007). Extracting semantic predications from Medline citations for pharmacogenomics. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 209--220.Google ScholarGoogle Scholar
  2. Bui, Q.-C., Nualláin, B. O., Boucher, C. A., & Sloot, P. M. A. (2010). Extracting causal relations on HIV drug resistance from literature. BMC Bioinformatics, 11, 101. doi: 10.1186/1471-2105-11-101Google ScholarGoogle ScholarCross RefCross Ref
  3. Cheng, D., Knox, C., Young, N., Stothard, P., Damaraju, S., & Wishart, D. S. (2008). PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Research, 36(Web Server issue), W399--405. doi: 10.1093/nar/gkn296Google ScholarGoogle Scholar
  4. Chowdhary, R., Zhang, J., & Liu, J. S. (2009). Bayesian inference of protein-protein interactions from biological literature. Bioinformatics (Oxford, England), 25(12), 1536--1542. doi: 10.1093/bioinformatics/btp245 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Coulet, A., Shah, N. H., Garten, Y., Musen, M., & Altman, R. B. (2010). Using text to build semantic networks for pharmacogenomics. Journal of Biomedical Informatics, 43(6), 1009--1019. doi: 10.1016/j.jbi.2010.08.005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Garten, Y., & Altman, R. B. (2009). Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text. BMC Bioinformatics, 10 Suppl 2, S6. doi: 10.1186/1471-2105-10-S2-S6Google ScholarGoogle ScholarCross RefCross Ref
  7. Hakenberg, J., Leaman, R., Vo, N. H., Jonnalagadda, S., Sullivan, R., Miller, C., Tari, L., et al. (2010). Efficient extraction of protein-protein interactions from full-text articles. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM, 7(3), 481--494. doi: 10.1109/TCBB.2010.51 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hewett, M., Oliver, D. E., Rubin, D. L., Easton, K. L., Stuart, J. M., Altman, R. B., & Klein, T. E. (2002). PharmGKB: The Pharmacogenetics Knowledge Base. Nucleic Acids Research, 30(1), 163--165. doi: 10.1093/nar/30.1.163Google ScholarGoogle ScholarCross RefCross Ref
  9. Krallinger, M., Leitner, F., Rodriguez-Penagos, C., & Valencia, A. (2008). Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology, 9 Suppl 2, S4. doi: 10.1186/gb-2008-9-s2-s4Google ScholarGoogle ScholarCross RefCross Ref
  10. Leaman, R., & Gonzalez, G. (2008). BANNER: an executable survey of advances in biomedical named entity recognition. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 652--663.Google ScholarGoogle Scholar
  11. Luis Tari, Jörg Hakenberg, Graciela Gonzalez, & Baral, C. (2009). Querying parse tree database of medline text to synthesize user-specific biomolecular networks. CiteSeerX. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.8574Google ScholarGoogle Scholar
  12. Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., & Salakoski, T. (2007). BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8, 50. doi: 10.1186/1471-2105-8-50Google ScholarGoogle ScholarCross RefCross Ref
  13. Rebholz-Schuhmann, D., Kirsch, H., Arregui, M., Gaudan, S., Riethoven, M., & Stoehr, P. (2007). EBIMed---text Crunching to Gather Facts for Proteins from Medline. Bioinformatics, 23(2), e237--e244. doi: 10.1093/bioinformatics/btl302 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sconce, E. A., Daly, A. K., Khan, T. I., Wynne, H. A., & Kamali, F. (2006). APOE genotype makes a small contribution to warfarin dose requirements. Pharmacogenetics and Genomics, 16(8), 609--611. doi: 10.1097/01.fpc.0000220567.98089.b5Google ScholarGoogle ScholarCross RefCross Ref
  15. Shepherd, A. J., & Clegg, A. B. (2008). Syntactic pattern matching with GraphSpider and MPL. Proceedings of the Third International Symposium on Semantic Mining in Biomedicine SMBM 2008 Turku Finland, 129--132.Google ScholarGoogle Scholar
  16. stav. (n.d.).GitHub. Retrieved March 26, 2012, from https://github.com/TsujiiLaboratory/stavGoogle ScholarGoogle Scholar
  17. Strijbos, J.-W., Martens, R. L., Prins, F. J., & Jochems, W. M. G. (2006). Content analysis: What are they talking about? Computers & Education, 46(1), 29--48. doi: 10.1016/j.compedu.2005.04.002T1.pdf. (n.d.). Retrieved from http://www.lrec-conf.org/proceedings/lrec2008/workshops/T1.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. The OpenNLP Homepage. (n.d.). Retrieved March 26, 2012, from http://opennlp.sourceforge.net/projects.htmlGoogle ScholarGoogle Scholar
  19. Wishart, D. S. (2006). DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34(90001), D668--D672. doi: 10.1093/nar/gkj067Google ScholarGoogle ScholarCross RefCross Ref
  1. Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          BioNLP '12: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
          June 2012
          257 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 8 June 2012

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate33of92submissions,36%
        • Article Metrics

          • Downloads (Last 12 months)11
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader