skip to main content
10.1145/1816123.1816160acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

oreChem ChemXSeer: a semantic digital library for chemistry

Authors Info & Claims
Published:21 June 2010Publication History

ABSTRACT

Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository Chemx Seer using "compound objects".

We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) (http://www.openarchives.org/ore/ standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query.

We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed methodperforms well in extracting experiment-related paragraphs from chemistry documents.

References

  1. D. Banville. Mining chemical structural information from the drug literature. Drug Discovery Today, 11(1--2):35--42, January 2006.Google ScholarGoogle Scholar
  2. G. Buchanan. Frbr: enriching and integrating digital libraries. In JCDL '06: Proceedings of the 6th ACM/IEEE--CS joint conference on Digital libraries, pages 260--269, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. J. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named graphs, provenance and trust. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 613--622, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. V. de Sompel, C. Lagoze, M. L. Nelson, S. Warner, R. Sanderson, and P. Johnston. Adding escience assets to the data web. CoRR, abs/0906.2135, 2009.Google ScholarGoogle Scholar
  6. H. Han, C. L. Giles, E. Manavoglu, H. Zha, Z. Zhang, and E. A. Fox. Automatic document metadata extraction using support vector machines. In JCDL '03: Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, pages 37--48, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. A. Hearst and E. Stoica. Nlp support for faceted navigation in scholarly collections. In 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 62--70, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Kataria, W. Browuer, P. Mitra, and C. L. Giles. Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents. In AAAI'08: Proceedings of the 23rd national conference on Artificial intel ligence, pages 1169--1174. AAAI Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Lagoze, H. V. de Sompel, M. L. Nelson, S. Warner, R. Sanderson, and P. Johnston. Ob ject re-use and exchange: A resource-centric approach. CoRR, abs/0804.2273, 2008.Google ScholarGoogle Scholar
  10. C. Lagoze, S. Payette, E. Shin, and C. Wilper. Fedora: an architecture for complex ob jects and their relationships. Lecture Notes in Computer Science, 6(2):124--138, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Liu, P. Mitra, C. L. Giles, and K. Bai. Automatic extraction of table metadata from digital documents. In JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pages 339--340, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Monev. Introduction to similarity searching in chemistry. institute of organic chemistry. In Bulgarian Academy of Sciences, Sofia 1113, Bulgaria. Match-Communications in Mathematical and in Computer Chemistry 51, pages 7--38, 2004.Google ScholarGoogle Scholar
  13. P. Murray-rust, H. S. Rzepa, and M. Wright. Development of chemical markup language (cml) as a system for handling complex chemical content. New J. Chem, 25:618--634, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. L. Z. Sebastian Ryszard Kruk, Stefan Decker. Jeromedl -- adding semantic web technologies to digital libraries. Lecture Notes in Computer Science, 3588:716--725, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. B. Shum, E. Motta, and J. Domingue. Scholonto: An ontology-based digital library server for research documents and discourse. International Journal on Digital Libraries, 3:237--248, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  16. B. Sun, P. Mitra, and C. L. Giles. Mining, indexing, and searching for textual chemical molecule information on the web. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 735--744, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. B. Teregowda, I. G. Councill, J. P. F. R., M. Kasbha, S. Zheng, and C. L. Giles. Seersuite: Developing a scalable and reliable application framework for building digital libraries by crawling the web. In Proceedings of the 2010 USENIX Conference on Web Application Development, page 12. USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Willett. Chemical similarity searching. J. Chem. Inf. Comput. Sci., 38(6):983--996, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  19. I. H. Witten and Et. Greenstone: A platform for distributed digital library applications. In Research and Advanced Technology for Digital Libraries, volume 2163/--1. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Zhao, C. Goble, and R. Stevens. Semantic web applications to e-science in silico experiments. In Proceedings of WWW, pages 284--285. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. oreChem ChemXSeer: a semantic digital library for chemistry

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          JCDL '10: Proceedings of the 10th annual joint conference on Digital libraries
          June 2010
          424 pages
          ISBN:9781450300858
          DOI:10.1145/1816123

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 June 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate415of1,482submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader