research-article

Free Access

Stand-off TEI annotation: the case of the National Corpus of Polish

Authors:
Piotr Bański

University of Warsaw, Warszawa, Poland

University of Warsaw, Warszawa, Poland
View Profile

,
Adam Przepiórkowski

Polish Academy of Sciences, Warszawa, Poland

Polish Academy of Sciences, Warszawa, Poland
View Profile

Authors Info & Claims

ACL-IJCNLP '09: Proceedings of the Third Linguistic Annotation WorkshopAugust 2009Pages 64–67

Published:06 August 2009Publication History

ACL-IJCNLP '09: Proceedings of the Third Linguistic Annotation Workshop

Pages 64–67

ABSTRACT

We present the annotation architecture of the National Corpus of Polish and discuss problems identified in the TEI stand-off annotation system, which, in its current version, is still very much unfinished and untested, due to both technical reasons (lack of tools implementing the TEI-defined XPointer schemes) and certain problems concerning data representation. We concentrate on two features that a stand-off system should possess and that are conspicuously missing in the current TEI Guidelines.

References

Ide, N. and L. Romary. (2007). Towards International Standards for Language Resources. In Dybkjaer, L., Hemsen, H., Minker, W. (eds.), Evaluation of Text and Speech Systems, Springer, 263--84.Google ScholarCross Ref
Przepiórkowski, A., R. L. Górski, B. Lewandowska-Tomaszczyk and M. Laziński. (2008). Towards the National Corpus of Polish. In the proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco.Google Scholar
TEI Consortium, eds. 2007. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 1.2.0. Last updated on February 1st 2009. TEI Consortium.Google Scholar

Index Terms

Stand-off TEI annotation: the case of the National Corpus of Polish
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus
Abstract
Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with ...
Read More
Encoding biomedical resources in TEI: the case of the GENIA corpus
BioMed '03: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13

It is well known that standardising the annotation of language resources significantly raises their potential, as it enables re-use and spurs the development of common technologies. Despite the fact that increasingly complex linguistic information is ...
Read More
Incorporating GENETAG-style annotation to GENIA corpus
BioNLP '09: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

Proteins and genes are the most important entities in molecular biology, and their automated recognition in text is the most widely studied task in biomedical information extraction (IE). Several corpora containing annotation for these entities have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL-IJCNLP '09: Proceedings of the Third Linguistic Annotation Workshop
August 2009
203 pages
ISBN:9781932432527
Program Chairs:
Manfred Stede
Universitat Potsdam
,
Chu-Ren Huang
Hong Kong Polytechnic University/Academia Sinica
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 August 2009
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 258
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Stand-off TEI annotation: the case of the National Corpus of Polish

ACL-IJCNLP '09: Proceedings of the Third Linguistic Annotation Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus

Encoding biomedical resources in TEI: the case of the GENIA corpus

Incorporating GENETAG-style annotation to GENIA corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Stand-off TEI annotation: the case of the National Corpus of Polish

ACL-IJCNLP '09: Proceedings of the Third Linguistic Annotation Workshop

ABSTRACT

References

Cited By

Index Terms

Recommendations

TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus

Encoding biomedical resources in TEI: the case of the GENIA corpus

Incorporating GENETAG-style annotation to GENIA corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media