ABSTRACT
Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository Chemx Seer using "compound objects".
We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) (http://www.openarchives.org/ore/ standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query.
We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed methodperforms well in extracting experiment-related paragraphs from chemistry documents.
- D. Banville. Mining chemical structural information from the drug literature. Drug Discovery Today, 11(1--2):35--42, January 2006.Google Scholar
- G. Buchanan. Frbr: enriching and integrating digital libraries. In JCDL '06: Proceedings of the 6th ACM/IEEE--CS joint conference on Digital libraries, pages 260--269, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- C. J. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarDigital Library
- J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named graphs, provenance and trust. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 613--622, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- H. V. de Sompel, C. Lagoze, M. L. Nelson, S. Warner, R. Sanderson, and P. Johnston. Adding escience assets to the data web. CoRR, abs/0906.2135, 2009.Google Scholar
- H. Han, C. L. Giles, E. Manavoglu, H. Zha, Z. Zhang, and E. A. Fox. Automatic document metadata extraction using support vector machines. In JCDL '03: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, pages 37--48, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
- M. A. Hearst and E. Stoica. Nlp support for faceted navigation in scholarly collections. In 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 62--70, 2009. Google ScholarDigital Library
- S. Kataria, W. Browuer, P. Mitra, and C. L. Giles. Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. In AAAI'08: Proceedings of the 23rd national conference on Artificial intel ligence, pages 1169--1174. AAAI Press, 2008. Google ScholarDigital Library
- C. Lagoze, H. V. de Sompel, M. L. Nelson, S. Warner, R. Sanderson, and P. Johnston. Ob ject re-use and exchange: A resource-centric approach. CoRR, abs/0804.2273, 2008.Google Scholar
- C. Lagoze, S. Payette, E. Shin, and C. Wilper. Fedora: an architecture for complex ob jects and their relationships. Lecture Notes in Computer Science, 6(2):124--138, 2006. Google ScholarDigital Library
- Y. Liu, P. Mitra, C. L. Giles, and K. Bai. Automatic extraction of table metadata from digital documents. In JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pages 339--340, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- V. Monev. Introduction to similarity searching in chemistry. institute of organic chemistry. In Bulgarian Academy of Sciences, Sofia 1113, Bulgaria. Match-Communications in Mathematical and in Computer Chemistry 51, pages 7--38, 2004.Google Scholar
- P. Murray-rust, H. S. Rzepa, and M. Wright. Development of chemical markup language (cml) as a system for handling complex chemical content. New J. Chem, 25:618--634, 2001.Google ScholarCross Ref
- L. Z. Sebastian Ryszard Kruk, Stefan Decker. Jeromedl -- adding semantic web technologies to digital libraries. Lecture Notes in Computer Science, 3588:716--725, 2005. Google ScholarDigital Library
- S. B. Shum, E. Motta, and J. Domingue. Scholonto: An ontology-based digital library server for research documents and discourse. International Journal on Digital Libraries, 3:237--248, 2000.Google ScholarCross Ref
- B. Sun, P. Mitra, and C. L. Giles. Mining, indexing, and searching for textual chemical molecule information on the web. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 735--744, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- P. B. Teregowda, I. G. Councill, J. P. F. R., M. Kasbha, S. Zheng, and C. L. Giles. Seersuite: Developing a scalable and reliable application framework for building digital libraries by crawling the web. In Proceedings of the 2010 USENIX Conference on Web Application Development, page 12. USENIX Association, 2010. Google ScholarDigital Library
- P. Willett. Chemical similarity searching. J. Chem. Inf. Comput. Sci., 38(6):983--996, 1998.Google ScholarCross Ref
- I. H. Witten and Et. Greenstone: A platform for distributed digital library applications. In Research and Advanced Technology for Digital Libraries, volume 2163/--1. Springer, 2001. Google ScholarDigital Library
- J. Zhao, C. Goble, and R. Stevens. Semantic web applications to e-science in silico experiments. In Proceedings of WWW, pages 284--285. ACM Press, 2004. Google ScholarDigital Library
Index Terms
- oreChem ChemXSeer: a semantic digital library for chemistry
Recommendations
ChemXSeer: a digital library and data repository for chemical kinetics
CIMS '07: Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScienceIn this paper, we describe the ChemXSeer system that hosts data and scholarly articles related to chemical kinetics. Domain scientists have different needs that are not served by general search engines. ChemXSeer enables chemists (and others) to search ...
Digital libraries and Web 3.0. The CallimachusDL approach
The constantly increasing volume of information available on the Internet is changing the forms of classification and access to data. Given the immense collection of information stored on the Internet, digital libraries constitute a fundamental subject ...
The bio-zen plus ontology
Towards a Metaontology for the Biomedical DomainBio-zen plus is an OWL DL ontology for the domain of biomedical research. It incorporates several existing Semantic Web ontologies: the DOLCE foundational ontology, the Simple Knowledge Organisation System (SKOS), the Semantically Interlinked Online ...
Comments