Article

Free Access

TnT: a statistical part-of-speech tagger

Author:
Thorsten Brants

Saarland University, Saarbrücken, Germany

Saarland University, Saarbrücken, Germany
View Profile

ANLC '00: Proceedings of the sixth conference on Applied natural language processingApril 2000Pages 224–231https://doi.org/10.3115/974147.974178

Published:29 April 2000Publication History

ANLC '00: Proceedings of the sixth conference on Applied natural language processing

Pages 224–231

ABSTRACT

Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.

References

Thorsten Brants, Wojciech Skut, and Hans Uszkoreit. 1999. Syntactic annotation of a German newspaper corpus. In Proceedings of the ATALA Treebank Workshop, pages 69--76, Paris, France.Google Scholar
Eric Brill. 1993. A Corpus-Based Approach to Language Learning. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania. Google ScholarDigital Library
Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784--789, Menlo Park: AAAI Press/MIT Press.Google ScholarDigital Library
Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1992. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (ACL), pages 133--140. Google ScholarDigital Library
Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. 1996. Mbt: A memory-based part of speech tagger-generator. In Proceedings of the Workshop on Very Large Corpora, Copenhagen, Denmark.Google Scholar
Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
Lawrence R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77(2), pages 257--285.Google ScholarCross Ref
Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-96, Philadelphia, PA.Google Scholar
Christer Samuelsson. 1993. Morphological tagging based entirely on Bayesian inference. In 9th Nordic Conference on Computational Linguistics NODALIDA-93, Stockholm University, Stockholm, Sweden.Google Scholar
Helmut Schmid. 1995. Improvements in part-of-speech tagging with an application to German. In Helmut Feldweg and Erhard Hinrichts, editors, Lexikon und Text. Niemeyer, Tübingen.Google Scholar
Wojciech Skut, Brigitte Krenn, Thorsten Brants, and Hans Uszkoreit. 1997. An annotation scheme for free word order languages. In Proceedings of the Fifth Conference on Applied Natural Language Processing ANLP-97, Washington, DC. Google ScholarDigital Library
Hans van Halteren, Jakub Zavrel, and Walter Daelemans. 1998. Improving data driven wordclass tagging by system combination. In Proceedings of the International Conference on Computational Linguistics COLING-98, pages 491--497, Montreal, Canada. Google ScholarDigital Library
Martin Volk and Gerold Schneider. 1998. Comparing a statistical and a rule-based tagger for german. In Proceedings of KONVENS-98, pages 125--137, Bonn.Google Scholar
Jakub Zavrel and Walter Daelemans. 1999. Evaluatie van part-of-speech taggers voor het corpus gesproken nederlands. CGN technical report, Katholieke Universiteit Brabant, Tilburg.Google Scholar

TnT: a statistical part-of-speech tagger
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Evaluation of TnT Tagger for Spanish
ENC '03: Proceedings of the 4th Mexican International Conference on Computer Science

Part of Speech (POS) tagger is a necessary module inmany natural language text processing tasks. A POS taggeris a program that accepts an unprepared raw text ininput and to each word adds a tag specifying its grammaticalproperties, such as part of ...
Read More
TNT: a numeric keypad based text input method
CHI '04: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

With the evolving functionality in television-based (TV-based) information and entertainment appliances, there is an increased need to enable users input text through remote control devices. We present a novel text input method, The Numpad Typer (TNT), ...
Read More
SemEval-2010 task 3: cross-lingual word sense disambiguation
SEW '09: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions

We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ANLC '00: Proceedings of the sixth conference on Applied natural language processing
April 2000
344 pages
Program Chair:
Sergei Nirenburg
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 29 April 2000
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 281
  Total Citations
  View Citations
- 3,193
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TnT: a statistical part-of-speech tagger

ANLC '00: Proceedings of the sixth conference on Applied natural language processing

ABSTRACT

References

Cited By

Recommendations

Evaluation of TnT Tagger for Spanish

TNT: a numeric keypad based text input method

SemEval-2010 task 3: cross-lingual word sense disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TnT: a statistical part-of-speech tagger

ANLC '00: Proceedings of the sixth conference on Applied natural language processing

ABSTRACT

References

Cited By

Recommendations

Evaluation of TnT Tagger for Spanish

TNT: a numeric keypad based text input method

SemEval-2010 task 3: cross-lingual word sense disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media