ABSTRACT
This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 1998), or a representation tracking all sub-fragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and named-entity extraction from web data.
- Aizerman, M., Braverman, E., & Rozonoer, L. (1964). Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning. In Automation and Remote Control, 25:821--837.Google Scholar
- Bod, R. (1998). Beyond Grammar: An Experience-Based Theory of Language. CSLI Publications/Cambridge University Press.Google Scholar
- Bod, R. (2001). What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? In Proceedings of ACL 2001. Google ScholarDigital Library
- Borthwick, A., Sterling, J., Agichtein, E., and Grishman, R. (1998). Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. Proc. of the Sixth Workshop on Very Large Corpora.Google Scholar
- Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of NAACL 2000. Google ScholarDigital Library
- Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania. Google ScholarDigital Library
- Collins, M. (2000). Discriminative Reranking for Natural Language Parsing. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000). Google ScholarDigital Library
- Collins, M., and Duffy, N. (2001). Convolution Kernels for Natural Language. In Proceedings of Neural Information Processing Systems (NIPS 14).Google Scholar
- Collins, M. (2002a). Ranking Algorithms for Named--Entity Extraction: Boosting and the Voted Perceptron. In Proceedings of ACL 2002. Google ScholarDigital Library
- Collins, M. (2002b). Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with the Perceptron Algorithm. In Proceedings of EMNLP 2002. Google ScholarDigital Library
- Cristianini, N., and Shawe-Tayor, J. (2000). An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press. Google ScholarDigital Library
- Freund, Y. & Schapire, R. (1999). Large Margin Classification using the Perceptron Algorithm. In Machine Learning, 37(3):277--296. Google ScholarDigital Library
- Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (1998). An efficient boosting algorithm for combining preferences. In Machine Learning: Proceedings of the Fifteenth International Conference. San Francisco: Morgan Kaufmann. Google ScholarDigital Library
- Goodman, J. (1996). Efficient algorithms for parsing the DOP model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 143--152.Google Scholar
- Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical report, University of Santa Cruz.Google Scholar
- Johnson, M., Geman, S., Canon, S., Chi, S., & Riezler, S. (1999). Estimators for stochastic 'unification-based" grammars. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Johnson, M. (2002). The DOP estimation method is biased and inconsistent. Computational Linguistics, 28, 71--76. Google ScholarDigital Library
- Lodhi, H., Christianini, N., Shawe-Taylor, J., & Watkins, C. (2001). Text Classification using String Kernels. In Advances in Neural Information Processing Systems 13, MIT Press. Google ScholarDigital Library
- Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of english: The Penn treebank. Computational Linguistics, 19, 313--330. Google ScholarDigital Library
- Ratnaparkhi, A. (1996). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference.Google Scholar
- Rosenblatt, F. 1958. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386--408. (Reprinted in Neurocomputing (MIT Press, 1998).) Google ScholarCross Ref
- New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron
Recommendations
LLLR parsing
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied ComputingThe idea of an LLLR parsing is presented. An LLLR(k) parser can be constructed for any LR(k) grammar but it produces the left parse of the input string in linear time (in respect to the length of the derivation) without backtracking. If used as a basis ...
Evaluating GLR parsing algorithms
The fourth workshop on language descriptions, tools, and applications (LDTA'04)We describe the behaviour of three variants of GLR parsing: (i) Farshi's original Correction to Tomita's non-general algorithm; (ii) the Right Nulled GLR algorithm which provides a more efficient generalisation of Tomita and (iii) the Binary Right ...
Comments