ABSTRACT
Japanese case markers, which indicate the grammatical relation of the complement NP to the predicate, often pose challenges to the generation of Japanese text, be it done by a foreign language learner, or by a machine translation (MT) system. In this paper, we describe the task of predicting Japanese case markers and propose machine learning methods for solving it in two settings: (i) monolingual, when given information only from the Japanese sentence; and (ii) bilingual, when also given information from a corresponding English source sentence in an MT context. We formulate the task after the well-studied task of English semantic role labelling, and explore features from a syntactic dependency structure of the sentence. For the monolingual task, we evaluated our models on the Kyoto Corpus and achieved over 84% accuracy in assigning correct case markers for each phrase. For the bilingual task, we achieved an accuracy of 92% per phrase using a bilingual dataset from a technical domain. We show that in both settings, features that exploit dependency information, whether derived from gold-standard annotations or automatically assigned, contribute significantly to the prediction of case markers.
- Baldwin, T. 2004. Making Sense of Japanese Relative Clause Constructions, In Proceedings of the 2nd Workshop on Text Meaning and Interpretation. Google ScholarDigital Library
- Blaheta, D. and E. Charniak. 2000. Assigning function tags to parsed text. In Proceedings of NAACL, pp.234--240. Google ScholarDigital Library
- Carreras, X. and L. Màrquez. 2005. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proceedings of CoNLL-2005. Google ScholarDigital Library
- Clarkson, P. R. and R. Rosenfeld. 1997. Statistical Language Modeling Using the CMU-Cambridge Toolkit. In Proceedings of ESCA Eurospeech, pp. 2007--2010.Google Scholar
- Collins, M. 2000. Discriminative reranking for natural language parsing. In Proceedings of ICML. Google ScholarDigital Library
- Gamon, M., E. Ringger, S. Corston-Oliver and R. Moore. 2002. Machine-learned Context for Linguistic Operations in German Sentence Realization. In Proceeding of ACL. Google ScholarDigital Library
- Gildea, D. and D. Jurafsky. 2002. Automatic Labeling of Semantic Roles. In Computational Linguistics 28(3): 245--288. Google ScholarDigital Library
- Hacioglu, K. 2004. Semantic Role Labeling using Dependency Trees. In Proceedings of COLING 2004. Google ScholarDigital Library
- Kawahara, D., N. Kaji and S. Kurohashi. 2000. Japanese Case Structure Analysis by Unsupervised Construction of a Case Frame Dictionary. In Proceedings of COLING, pp. 432--438. Google ScholarDigital Library
- Kurohashi, S. and M. Nagao. 1997. Kyoto University Text Corpus Project. In Proceedings of ANLP, pp.115--118.Google Scholar
- Masuoka, T. and Y. Takubo. 1992. Kiso Nihongo Bunpou (Fundamental Japanese grammar), revised version. Kuroshio Shuppan, Tokyo.Google Scholar
- Murata, M., and H. Isahara. 2005. Japanese Case Analysis Based on Machine Learning Method that Uses Borrowed Supervised Data. In Proceedings of IEEE NLP-KE-2005, pp.774--779.Google Scholar
- Och, F. J. and H. Ney. 2000. Improved statistical alignment models. In Proceedings of ACL: pp.440--447. Google ScholarDigital Library
- Palmer, M., D. Gildea and P. Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. In Computational Linguistics 31(1). Google ScholarDigital Library
- Pradhan, S., W. Ward, K. Hacioglu, L. Martin, D. Jurafsky. 2004. Shallow Semantic Parsing Using Support Vector Machines. In Proceedings of HLT/NAACL.Google Scholar
- Quirk, C., A. Menezes and C. Cherry. 2005. Dependency Tree Translation: Syntactically Informed Phrasal SMT. In Proceedings of ACL. Google ScholarDigital Library
- Teramura, H. 1991. Nihongo-no shintakusu-toimi (Japanese syntax and meaning). Volume III. Kuroshio Shuppan, Tokyo.Google Scholar
- Toutanova, K., A. Haghighi and C. D. Manning. 2005. Joint Learning Improves Semantic Role Labeling. In Proceeding of ACL, pp.589--596. Google ScholarDigital Library
- Uchimoto, K., S. Sekine and H. Isahara. 2002. Text Generation from Keywords. In Proceedings of COLING 2002, pp.1037--1043. Google ScholarDigital Library
- Learning to predict case markers in Japanese
Recommendations
Case markers and morphology: addressing the crux of the fluency problem in English-Hindi SMT
ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2We report in this paper our work on accurately generating case markers and suffixes in English-to-Hindi SMT. Hindi is a relatively free word-order language, and makes use of a comparatively richer set of case markers and morphological suffixes for ...
Enriching the adjective domain in the Japanese wordnet
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processingWe released Japanese WordNet Version 1.0 in March 2010, and are continuing to enrich the Japanese WordNet in several directions. The current version of the Japanese WordNet is a kind of translation of Princeton WordNet 3.0 and we used WordNets of ...
Developing Japanese WordNet Affect for analyzing emotions
WASSA '11: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment AnalysisThis paper reports the development of Japanese WordNet Affect from the English WordNet Affect lists with the help of English SentiWordNet and Japanese WordNet. Expanding the available synsets of the English WordNet Affect using SentiWordNet, we have ...
Comments