skip to main content
10.3115/1073336.1073359dlproceedingsArticle/Chapter ViewAbstractPublication PagesnaaclConference Proceedingsconference-collections
Article
Free Access

Applying co-training methods to statistical parsing

Published:02 June 2001Publication History

ABSTRACT

We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algorithm iteratively labels the entire data set with parse trees. Using empirical results based on parsing the Wall Street Journal corpus we show that training a statistical parser on the combined labeled and unlabeled data strongly out-performs training only on the labeled data.

References

  1. E. Black, S. Abney, D. Flickinger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 1991. A procedure for quantitatively comparing the syntactic coverage of english grammars. In Proc. DARPA Speech and Natural Language Workshop, pages 306--311. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Blum and T. Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-Training. In Proc. of 11th Annual Conf. on Comp. Learning Theory (COLT), pages 92--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Brill. 1997. Unsupervised learning of disambiguation rules for part of speech tagging. In Natural Language Processing Using Very Large Corpora. Kluwer Academic Press.Google ScholarGoogle Scholar
  4. G. Carroll and M. Rooth. 1998. Valence Induction with a Head-Lexicalized PCFG. http://xxx.lanl.gov/abs/cmp-lg/9805001, May.Google ScholarGoogle Scholar
  5. C. Chelba and F. Jelinek. 1998. Exploiting syntactic structure for language modeling. In Proc. of COLING-ACL '98, pages 225--231, Montreal. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Collins and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. In Proc. of WVLC/EMNLP-99, pages 100--110.Google ScholarGoogle Scholar
  7. D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. 1992. A practical part-of-speech tagger. In Proc. of 3rd ANLP Conf., Trento, Italy. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Elworthy. 1994. Does baum-welch re-estimation help taggers? In Proc. of 4th ANLP Conf., pages 53--58, Stuttgart, October 13-15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. W. Fong and D. Wu. 1996. Learning restricted probabilistic link grammars. In S. Wermter, E. Riloff, and G. Scheler, editors, Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, pages 173--187. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Goldman and Y. Zhou. 2000. Enhancing supervised learning with unlabeled data. In Proc. of ICML'2000, Stanford University, June 29--July 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rebecca Hwa. 2000. Sample selection for statistical grammar induction. In Proceedings of EMNLP/VLC-2000, pages 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. K. Joshi and Y. Schabes. 1992. Tree-adjoining grammar and lexicalized grammars. In M. Nivat and A. Podelski, editors, Tree automata and languages, pages 409--431. Elsevier Science.Google ScholarGoogle Scholar
  13. A. K. Joshi, L. Levy, and M. Takahashi. 1975. Tree Adjunct Grammars. Journal of Computer and System Sciences.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. K. Joshi. 1985. Tree Adjoining Grammars: How much context Sensitivity is required to provide a reasonable structural description. In D. Dowty, I. Karttunen, and A. Zwicky, editors, Natural Language Parsing, pages 206--250. Cambridge University Press, Cambridge, U.K.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Lafferty, D. Sleator, and D. Temperley. 1992. Grammatical trigrams: A probabilistic model of link grammar. In Proc. of the AAAI Conf. on Probabilistic Approaches to Natural Language.Google ScholarGoogle Scholar
  16. K. Lari and S. J. Young. 1990. The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language, 4:35--56.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. de Marcken. 1995. Lexical heads, phrase structure and the induction of grammar. In D. Yarowsky and K. Church, editors, Proc. of 3rd WVLC, pages 14--26, MIT, Cambridge, MA.Google ScholarGoogle Scholar
  18. M. Marcus, B. Santorini, and M. Marcinkiewiecz. 1993. Building a large annotated corpus of english. Computational Linguistics, 19(2):313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Merialdo. 1994. Tagging english text with a probabilistic model. Computational Linguistics, 20(2):155--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proc. of Ninth International Conference on Information and Knowledge (CIKM-2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom Mitchell. 1999. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 1(34). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Della Pietra, V. Della Pietra, J. Gillett, J. Lafferty, H. Printz, and L. Ureš. 1994. Inference and estimation of a long-range trigram model. In R. Carrasco and J. Oncina, editors, Proc. of ICGI-94. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Ratnaparkhi. 1996. A Maximum Entropy Part-Of-Speech Tagger. In Proc. of EMNLP-96, University of Pennsylvania.Google ScholarGoogle Scholar
  24. P. Resnik. 1992. Probabilistic tree-adjoining grammars as a framework for statistical natural language processing. In Proc. of COLING '92, volume 2, pages 418--424, Nantes, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Schabes. 1992. Stochastic lexicalized tree-adjoining grammars. In Proc. of COLING '92, volume 2, pages 426--432, Nantes, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Srinivas. 1997. Complexity of Lexical Descriptions and its Relevance to Partial Parsing. Ph.D. thesis, Department of Computer and Information Sciences, University of Pennsylvania.Google ScholarGoogle Scholar
  27. F. Xia, M. Palmer, and A. Joshi. 2000. A Uniform Method of Grammar Extraction and its Applications. In Proc. of EMNLP/VLC-2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proc. 33rd Meeting of the ACL, pages 189--196, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Applying co-training methods to statistical parsing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        NAACL '01: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
        June 2001
        293 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 2 June 2001

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate21of29submissions,72%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader