ABSTRACT
We propose a novel Co-Training method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algorithm iteratively labels the entire data set with parse trees. Using empirical results based on parsing the Wall Street Journal corpus we show that training a statistical parser on the combined labeled and unlabeled data strongly out-performs training only on the labeled data.
- E. Black, S. Abney, D. Flickinger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 1991. A procedure for quantitatively comparing the syntactic coverage of english grammars. In Proc. DARPA Speech and Natural Language Workshop, pages 306--311. Morgan Kaufmann. Google ScholarDigital Library
- A. Blum and T. Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-Training. In Proc. of 11th Annual Conf. on Comp. Learning Theory (COLT), pages 92--100. Google ScholarDigital Library
- E. Brill. 1997. Unsupervised learning of disambiguation rules for part of speech tagging. In Natural Language Processing Using Very Large Corpora. Kluwer Academic Press.Google Scholar
- G. Carroll and M. Rooth. 1998. Valence Induction with a Head-Lexicalized PCFG. http://xxx.lanl.gov/abs/cmp-lg/9805001, May.Google Scholar
- C. Chelba and F. Jelinek. 1998. Exploiting syntactic structure for language modeling. In Proc. of COLING-ACL '98, pages 225--231, Montreal. Google ScholarDigital Library
- M. Collins and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. In Proc. of WVLC/EMNLP-99, pages 100--110.Google Scholar
- D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. 1992. A practical part-of-speech tagger. In Proc. of 3rd ANLP Conf., Trento, Italy. ACL. Google ScholarDigital Library
- D. Elworthy. 1994. Does baum-welch re-estimation help taggers? In Proc. of 4th ANLP Conf., pages 53--58, Stuttgart, October 13-15. Google ScholarDigital Library
- E. W. Fong and D. Wu. 1996. Learning restricted probabilistic link grammars. In S. Wermter, E. Riloff, and G. Scheler, editors, Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, pages 173--187. Springer-Verlag. Google ScholarDigital Library
- S. Goldman and Y. Zhou. 2000. Enhancing supervised learning with unlabeled data. In Proc. of ICML'2000, Stanford University, June 29--July 2. Google ScholarDigital Library
- Rebecca Hwa. 2000. Sample selection for statistical grammar induction. In Proceedings of EMNLP/VLC-2000, pages 45--52. Google ScholarDigital Library
- A. K. Joshi and Y. Schabes. 1992. Tree-adjoining grammar and lexicalized grammars. In M. Nivat and A. Podelski, editors, Tree automata and languages, pages 409--431. Elsevier Science.Google Scholar
- A. K. Joshi, L. Levy, and M. Takahashi. 1975. Tree Adjunct Grammars. Journal of Computer and System Sciences.Google ScholarDigital Library
- A. K. Joshi. 1985. Tree Adjoining Grammars: How much context Sensitivity is required to provide a reasonable structural description. In D. Dowty, I. Karttunen, and A. Zwicky, editors, Natural Language Parsing, pages 206--250. Cambridge University Press, Cambridge, U.K.Google ScholarCross Ref
- J. Lafferty, D. Sleator, and D. Temperley. 1992. Grammatical trigrams: A probabilistic model of link grammar. In Proc. of the AAAI Conf. on Probabilistic Approaches to Natural Language.Google Scholar
- K. Lari and S. J. Young. 1990. The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language, 4:35--56.Google ScholarCross Ref
- C. de Marcken. 1995. Lexical heads, phrase structure and the induction of grammar. In D. Yarowsky and K. Church, editors, Proc. of 3rd WVLC, pages 14--26, MIT, Cambridge, MA.Google Scholar
- M. Marcus, B. Santorini, and M. Marcinkiewiecz. 1993. Building a large annotated corpus of english. Computational Linguistics, 19(2):313--330. Google ScholarDigital Library
- B. Merialdo. 1994. Tagging english text with a probabilistic model. Computational Linguistics, 20(2):155--172. Google ScholarDigital Library
- Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proc. of Ninth International Conference on Information and Knowledge (CIKM-2000). Google ScholarDigital Library
- Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom Mitchell. 1999. Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, 1(34). Google ScholarDigital Library
- S. Della Pietra, V. Della Pietra, J. Gillett, J. Lafferty, H. Printz, and L. Ureš. 1994. Inference and estimation of a long-range trigram model. In R. Carrasco and J. Oncina, editors, Proc. of ICGI-94. Springer-Verlag. Google ScholarDigital Library
- A. Ratnaparkhi. 1996. A Maximum Entropy Part-Of-Speech Tagger. In Proc. of EMNLP-96, University of Pennsylvania.Google Scholar
- P. Resnik. 1992. Probabilistic tree-adjoining grammars as a framework for statistical natural language processing. In Proc. of COLING '92, volume 2, pages 418--424, Nantes, France. Google ScholarDigital Library
- Y. Schabes. 1992. Stochastic lexicalized tree-adjoining grammars. In Proc. of COLING '92, volume 2, pages 426--432, Nantes, France. Google ScholarDigital Library
- B. Srinivas. 1997. Complexity of Lexical Descriptions and its Relevance to Partial Parsing. Ph.D. thesis, Department of Computer and Information Sciences, University of Pennsylvania.Google Scholar
- F. Xia, M. Palmer, and A. Joshi. 2000. A Uniform Method of Grammar Extraction and its Applications. In Proc. of EMNLP/VLC-2000. Google ScholarDigital Library
- D. Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proc. 33rd Meeting of the ACL, pages 189--196, Cambridge, MA. Google ScholarDigital Library
- Applying co-training methods to statistical parsing
Recommendations
Wide-coverage efficient statistical parsing with ccg and log-linear models
This article describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are “full” parsing models in the sense that probabilities are defined for complete parses, rather than for independent events ...
Comments