ABSTRACT
The structure of a document has an important influence on the perception of its content. Considering scientific publications, we can affirm that by making use of the ordinary linear layout, a well organized publication, following a "red wire", will always be better understood and analyzed than one having a poor or chaotic structure, but not necessarily poor content. Reading a publication in a linear way, from the first page to the last page means a lot of unnecessary information processing to the reader. Looking at a publication from another perspective by accessing the key-points or argumentative structure directly can give better insights into the author's thoughts, and for certain tasks (i.e. getting a first impression of an article) a representation of the document reduced to its core could be more important than its linear structure. In this paper, we will show how one can build different representations of the same document, by exploiting the semantics captured in the text. The focus will be on scientific publications and as building foundation we use the SALT (Semantically Annotated LATEX) annotation framework for creating Semantic PDF Documents.
- T. Groza, S. Handschuh, K. Müller, and S. Decker. SALT - Semantically Annotated LATEX for Scientific Publications. In Proceedings of the Fourth European Semantic Web Conference, (ESWC 2007), Innsbruck, Austria, May, 2007. Google ScholarDigital Library
- W. Kunz and H.W.J. Rittel. Issues as elements of information system. Working paper 131, Institute of Urban and Regional Development, University of California, 1970.Google Scholar
- W. C. Mann and S. A. Thompson. Rhetorical structure theory: A theory of text organization. Technical Report RS-87-190, Information Science Institute, 1987.Google Scholar
- Daniel Marcu. Building up rhetorical structure trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, August, volume 2, pages 1069--1074, 1996. Google ScholarDigital Library
- M. Taboada and W. C. Mann. Applications of rhetorical structure theory. Discourse Studies, 8, No. 4:567-588, 2006.Google ScholarCross Ref
- C. Tempich, H. S. Pinto, Y. Sure, and S. Staab. An Argumentation Ontology for Distributed, Loosely-controlled and evolvInG Engineering processes of oNTologies (DILIGENT). In Proceedings of the Second European Semantic Web Conference, (ESWC 2005), Heraklion, Crete, Greece, May, 2005. Google ScholarDigital Library
- V. Uren, S. B. Shum, G. Li, and M. Bachler. Sensemaking tools for understanding research literatures: Design, implementation and user evaluation. Int. Jnl. Human Computer Studies, 64, No.5:420--445, 2006. Google ScholarDigital Library
Index Terms
- SALT: a semantic approach for generating document representations
Recommendations
The salt triple: framework editor publisher
DocEng '07: Proceedings of the 2007 ACM symposium on Document engineeringIn this paper we present the SALT (Semantically Annotated LATEX) Triple, a set of tools built to demonstrate a complete annotation workflow from creation to usage. The Triple set contains the authoring and annotation framework, an editor and a web ...
Semi-Automatic LaTeX-Based Labeling of Mathematical Objects in PDF Documents: MOP Data Set
DocEng '19: Proceedings of the ACM Symposium on Document Engineering 2019Mathematical objects (MO) in PDF documents is paramount in understanding the ontology and mathematical essence in published science, technology, engineering, and mathematical (STEM) documents. As of now, Marmot is the only publicly available data set ...
Crowdsourced semantic annotation of scientific publications and tabular data in PDF
SEMANTICS '15: Proceedings of the 11th International Conference on Semantic SystemsSignificant amounts of knowledge in science and technology have so far not been published as Linked Open Data but are contained in the text and tables of legacy PDF publications. Making such information available as RDF would, for example, provide ...
Comments