Article

Free Access

New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

Authors:
Michael Collins

AT&T Labs-Research, New Jersey

AT&T Labs-Research, New Jersey
View Profile

,
Nigel Duffy

iKuni Inc., Palo Alto, CA

iKuni Inc., Palo Alto, CA
View Profile

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational LinguisticsJuly 2002Pages 263–270https://doi.org/10.3115/1073083.1073128

Published:06 July 2002Publication History

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Pages 263–270

ABSTRACT

This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 1998), or a representation tracking all sub-fragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and named-entity extraction from web data.

References

Aizerman, M., Braverman, E., & Rozonoer, L. (1964). Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning. In Automation and Remote Control, 25:821--837.Google Scholar
Bod, R. (1998). Beyond Grammar: An Experience-Based Theory of Language. CSLI Publications/Cambridge University Press.Google Scholar
Bod, R. (2001). What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? In Proceedings of ACL 2001. Google ScholarDigital Library
Borthwick, A., Sterling, J., Agichtein, E., and Grishman, R. (1998). Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. Proc. of the Sixth Workshop on Very Large Corpora.Google Scholar
Charniak, E. (2000). A maximum-entropy-inspired parser. In Proceedings of NAACL 2000. Google ScholarDigital Library
Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. PhD Dissertation, University of Pennsylvania. Google ScholarDigital Library
Collins, M. (2000). Discriminative Reranking for Natural Language Parsing. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000). Google ScholarDigital Library
Collins, M., and Duffy, N. (2001). Convolution Kernels for Natural Language. In Proceedings of Neural Information Processing Systems (NIPS 14).Google Scholar
Collins, M. (2002a). Ranking Algorithms for Named--Entity Extraction: Boosting and the Voted Perceptron. In Proceedings of ACL 2002. Google ScholarDigital Library
Collins, M. (2002b). Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with the Perceptron Algorithm. In Proceedings of EMNLP 2002. Google ScholarDigital Library
Cristianini, N., and Shawe-Tayor, J. (2000). An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press. Google ScholarDigital Library
Freund, Y. & Schapire, R. (1999). Large Margin Classification using the Perceptron Algorithm. In Machine Learning, 37(3):277--296. Google ScholarDigital Library
Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (1998). An efficient boosting algorithm for combining preferences. In Machine Learning: Proceedings of the Fifteenth International Conference. San Francisco: Morgan Kaufmann. Google ScholarDigital Library
Goodman, J. (1996). Efficient algorithms for parsing the DOP model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 143--152.Google Scholar
Haussler, D. (1999). Convolution Kernels on Discrete Structures. Technical report, University of Santa Cruz.Google Scholar
Johnson, M., Geman, S., Canon, S., Chi, S., & Riezler, S. (1999). Estimators for stochastic 'unification-based" grammars. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
Johnson, M. (2002). The DOP estimation method is biased and inconsistent. Computational Linguistics, 28, 71--76. Google ScholarDigital Library
Lodhi, H., Christianini, N., Shawe-Taylor, J., & Watkins, C. (2001). Text Classification using String Kernels. In Advances in Neural Information Processing Systems 13, MIT Press. Google ScholarDigital Library
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of english: The Penn treebank. Computational Linguistics, 19, 313--330. Google ScholarDigital Library
Ratnaparkhi, A. (1996). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference.Google Scholar
Rosenblatt, F. 1958. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65, 386--408. (Reprinted in Neurocomputing (MIT Press, 1998).) Google ScholarCross Ref

New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

LLLR parsing
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

The idea of an LLLR parsing is presented. An LLLR(k) parser can be constructed for any LR(k) grammar but it produces the left parse of the input string in linear time (in respect to the length of the derivation) without backtracking. If used as a basis ...
Read More
Evaluating GLR parsing algorithms
The fourth workshop on language descriptions, tools, and applications (LDTA'04)

We describe the behaviour of three variants of GLR parsing: (i) Farshi's original Correction to Tomita's non-general algorithm; (ii) the Right Nulled GLR algorithm which provides a more efficient generalisation of Tomita and (iii) the Binary Right ...
Read More
Parsing minimalist languages
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
July 2002
543 pages
General Chair:
Pierre Isabelle
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 July 2002
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 151
  Total Citations
  View Citations
- 1,504
  Total Downloads
- Downloads (Last 12 months)135
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

LLLR parsing

Evaluating GLR parsing algorithms

Parsing minimalist languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

LLLR parsing

Evaluating GLR parsing algorithms

Parsing minimalist languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media