skip to main content
10.3115/974147.974178dlproceedingsArticle/Chapter ViewAbstractPublication PagesanlcConference Proceedingsconference-collections
Article
Free Access

TnT: a statistical part-of-speech tagger

Published:29 April 2000Publication History

ABSTRACT

Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.

References

  1. Thorsten Brants, Wojciech Skut, and Hans Uszkoreit. 1999. Syntactic annotation of a German newspaper corpus. In Proceedings of the ATALA Treebank Workshop, pages 69--76, Paris, France.Google ScholarGoogle Scholar
  2. Eric Brill. 1993. A Corpus-Based Approach to Language Learning. Ph.D. Dissertation, Department of Computer and Information Science, University of Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784--789, Menlo Park: AAAI Press/MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1992. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (ACL), pages 133--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Walter Daelemans, Jakub Zavrel, Peter Berck, and Steven Gillis. 1996. Mbt: A memory-based part of speech tagger-generator. In Proceedings of the Workshop on Very Large Corpora, Copenhagen, Denmark.Google ScholarGoogle Scholar
  6. Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lawrence R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. In Proceedings of the IEEE, volume 77(2), pages 257--285.Google ScholarGoogle ScholarCross RefCross Ref
  8. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-96, Philadelphia, PA.Google ScholarGoogle Scholar
  9. Christer Samuelsson. 1993. Morphological tagging based entirely on Bayesian inference. In 9th Nordic Conference on Computational Linguistics NODALIDA-93, Stockholm University, Stockholm, Sweden.Google ScholarGoogle Scholar
  10. Helmut Schmid. 1995. Improvements in part-of-speech tagging with an application to German. In Helmut Feldweg and Erhard Hinrichts, editors, Lexikon und Text. Niemeyer, Tübingen.Google ScholarGoogle Scholar
  11. Wojciech Skut, Brigitte Krenn, Thorsten Brants, and Hans Uszkoreit. 1997. An annotation scheme for free word order languages. In Proceedings of the Fifth Conference on Applied Natural Language Processing ANLP-97, Washington, DC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hans van Halteren, Jakub Zavrel, and Walter Daelemans. 1998. Improving data driven wordclass tagging by system combination. In Proceedings of the International Conference on Computational Linguistics COLING-98, pages 491--497, Montreal, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Martin Volk and Gerold Schneider. 1998. Comparing a statistical and a rule-based tagger for german. In Proceedings of KONVENS-98, pages 125--137, Bonn.Google ScholarGoogle Scholar
  14. Jakub Zavrel and Walter Daelemans. 1999. Evaluatie van part-of-speech taggers voor het corpus gesproken nederlands. CGN technical report, Katholieke Universiteit Brabant, Tilburg.Google ScholarGoogle Scholar
  1. TnT: a statistical part-of-speech tagger

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ANLC '00: Proceedings of the sixth conference on Applied natural language processing
        April 2000
        344 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 29 April 2000

        Qualifiers

        • Article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader