ABSTRACT
Since there is no standard naming convention for genes and gene products, gene symbol disambiguation (GSD) has become a big challenge when mining biomedical literature. Several GSD methods have been proposed based on MEDLINE references to genes. However, nowadays gene databases, e.g. Entrez Gene, provide plenty of information about genes, and many biomedical ontologies, e.g. UMLS Metathesaurus and Semantic Network, have been developed. These knowledge sources could be used for disambiguation, in this paper we propose a method which relies on information about gene candidates from gene databases, contexts of gene symbols and biomedical ontologies. We implement our method, and evaluate the performance of the implementation using BioCreAtIvE II data sets.
- Chen L., Liu H., Friedman C. (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21:248--256. Google ScholarDigital Library
- Gale, W., K. Church, and D. Yarowsky. (1992) One Sense Per Discourse. Proceedings of the 4th DARPA Speech and Natural Language Workshop 233--237. Google ScholarDigital Library
- Hatzivassiloglou V., Duboue PA., Rzhetsky A. (2001) Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics 17:S97--106.Google ScholarCross Ref
- Jensen J.L., Saric J., Bork P. (2006) Literature mining for the biologist:from information retrieval to biological discovery. Nature Reviews Genetics 7:119--129.Google ScholarCross Ref
- Krallinger M., Valencia A. (2005) Text-Mining and Information-Retrieval Services for Molecular Biology. Genome Biology 6:224.Google ScholarCross Ref
- Lambrix P., Tan H., Jakoniene V., Strömbäck L. (2007) Biological Ontologies.chapter 4 in Baker, Cheung (eds), Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences 85--99.Google Scholar
- Leser U., Hakenberg J. (2005) What makes a gene name? Named entity recognition in the biomedical literature. Briefings in Bioinformatics 6(4):357--369.Google ScholarCross Ref
- Podowski R.M., Cleary J.G., Goncharoff N.T. (2004) AZuRE, a scalable system for automated term disambiguation of gene and protein names. Proceedings IEEE Comput. Syst. Bioinform. Conf. 415--424. Google ScholarDigital Library
- Schijvenaars BJA, et al. (2005)Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics 6:149.Google ScholarCross Ref
- Tamames J., Valencia A. (2006)The success (or not) of HUGO nomenclature. Genome Biol.7:402.Google ScholarCross Ref
- Xu H., et al. (2007) Gene symbol disambiguation using knowledge-based profiles. Bioinformatics 23(8):1015--1022. Google ScholarDigital Library
Index Terms
- Knowledge-based gene symbol disambiguation
Recommendations
NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition
Graphical abstractDisplay Omitted
Highlights- Genes are one of the most searched bio-entities in biomedical literature.
- NLM-Gene corpus is a novel resource of high-quality doubly-annotated articles.
- NLM-Gene corpus articles are rich in bio-entities and gene mentions per ...
AbstractThe automatic recognition of gene names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. While current methods for tagging gene entities have been developed ...
Biomedical Term Disambiguation: An Application to Gene-Protein Name Disambiguation
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsThe huge volumes of biomedical texts available online drives the increasing need for automated techniques to analyze and extract knowledge from these repositories of information. Resolving the ambiguity in biological terms in these texts is an important ...
Comments