ABSTRACT
The demand for deep linguistic analysis for huge volumes of data means that it is increasingly important that the time taken to parse such data is minimized. In the XLE parsing model which is a hand-crafted, unification-based parsing system, most of the time is spent on unification, searching for valid f-structures (dependency attribute-value matrices) within the space of the many valid c-structures (phrase structure trees). We carried out an experiment to determine whether pruning the search space at an earlier stage of the parsing process results in an improvement in the overall time taken to parse, while maintaining the quality of the f-structures produced. We retrained a state-of-the-art probabilistic parser and used it to pre-bracket input to the XLE, constraining the valid c-structure space for each sentence. We evaluated against the PARC 700 Dependency Bank and show that it is possible to decrease the time taken to parse by ~18% while maintaining accuracy.
- Srinivas Bangalore and Aravind K. Joshi. 1999. Supertagging: An approach to alsmost parsing. Computational Linguistics, 25(2):237--265. Google ScholarDigital Library
- Dan Bikel. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of HLT, YEAR = 2002, pages = 24--27, address = San Diego, CA,. Google ScholarDigital Library
- Miriam Butt, Helge Dyvik, Tracy Holloway King, Hiroshi Masuichi, and Christian Rohrer. 2002. The Parallel Grammar Project. In Proceedings of Workshop on Grammar Engineering and Evaluation, pages 1--7, Taiwan. Google ScholarDigital Library
- Aoife Cahill, Martin Forst, Michael Burke, Mairead McCarthy, Ruth O'Donovan, Christian Rohrer, Josef van Genabith, and Andy Way. 2005. Treebank-based acquisition of multilingual unification grammar resources. Journal of Research on Language and Computation, pages 247--279.Google ScholarCross Ref
- Eugene Charniak and Mark Johnson. 2005. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking. In Proceedings of ACL, pages 173--180, Ann Arbor, Michigan. Google ScholarDigital Library
- Eugene Charniak. 2000. A maximum entropy inspired parser. In Proceedings of NAACL, pages 132--139, Seattle, WA. Google ScholarDigital Library
- Stephen Clark and James R. Curran. 2004. The Importance of Supertagging for Wide-Coverage CCG Parsing. In Proceedings of COLING, pages 282--288, Geneva, Switzerland, Aug 23--Aug 27. COLING. Google ScholarDigital Library
- Richard Crouch, Ron Kaplan, Tracy Holloway King, and Stefan Riezler. 2002. A comparison of evaluation metrics for a broad coverage parser. In Proceedings of the LREC Workshop: Beyond PARSEVAL, pages 67--74, Las Palmas, Canary Islands, Spain.Google Scholar
- Ron Kaplan and Joan Bresnan. 1982. Lexical Functional Grammar, a Formal System for Grammatical Representation. In Joan Bresnan, editor, The Mental Representation of Grammatical Relations, pages 173--281. MIT Press, Cambridge, MA.Google Scholar
- Ron Kaplan, Stefan Riezler, Tracy Holloway King, John T. Maxwell, Alexander Vasserman, and Richard Crouch. 2004. Speed and Accuracy in Shallow and Deep Stochastic Parsing. In Proceedings of HLT-NAACL, pages 97--104, Boston, MA.Google Scholar
- Tracy Holloway King, Richard Crouch, Stefan Riezler, Mary Dalrymple, and Ron Kaplan. 2003. The PARC 700 dependency bank. In Proceedings of LINC, pages 1--8, Budapest, Hungary.Google Scholar
- Alexandra Kinyon. 2000. Hypertags. In Proceedings of COLING, pages 446--452, Saarbrücken. Google ScholarDigital Library
- Takuya Matsuzaki, Yusuke Miyao, and Jun'ichi Tsujii. 2007. Efficient HPSG Parsing with Supertagging and CFG-filtering. In Proceedings of IJCAI, pages 1671--1676, India. Google ScholarDigital Library
- John T. Maxwell and Ronald M. Kaplan. 1993. The interface between phrasal and functional constraints. Computational Linguistics, 19(4): 571--590. Google ScholarDigital Library
- Takashi Ninomiya, Takuya Matsuzaki, Yoshimasa Tsuruoka, Yusuke Miyao, and Jun'ichi Tsujii. 2006. Extremely Lexicalized Models for Accurate and Fast HPSG Parsing. In Proceedings of EMNLP, pages 155--163, Australia. Google ScholarDigital Library
- Eric W. Noreen. 1989. Computer Intensive Methods for Testing Hypotheses: An Introduction. Wiley, New York.Google Scholar
- Adwait Ratnaparkhi. 1996. A Maximum Entropy Part-Of-Speech Tagger. In Proceedings of EMNLP, pages 133--142, Philadelphia, PA.Google Scholar
- Stefan Riezler, Tracy King, Ronald Kaplan, Richard Crouch, John T. Maxwell, and Mark Johnson. 2002. Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. In Proceedings of ACL, pages 271--278, Philadelphia, PA. Google ScholarDigital Library
- Pruning the search space of a hand-crafted parsing system with a probabilistic parser
Recommendations
A hybrid Japanese parser with hand-crafted grammar and statistics
COLING '00: Proceedings of the 18th conference on Computational linguistics - Volume 1This paper describes a hybrid parsing method for Japanese which uses both a hand-crafted grammar and a statistical technique. The key feature of our system is that in order to estimate likelihood for a parse tree, the system uses information taken from ...
Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking
Special Issue on Concurrency Specification and Programming (CS&P)Two recent developments in the field of formal languages are Parsing Expression Grammar (PEG) and packrat parsing. The PEG formalism is similar to BNF, but defines syntax in terms of recognizing strings, rather than constructing them. It is, in fact, ...
Parsing Expression Grammar as a Primitive Recursive-Descent Parser with Backtracking
Special Issue on Concurrency Specification and Programming (CS&P)Two recent developments in the field of formal languages are Parsing Expression Grammar (PEG) and packrat parsing. The PEG formalism is similar to BNF, but defines syntax in terms of recognizing strings, rather than constructing them. It is, in fact, ...
Comments