Abstract
Although researchers have conducted extensive studies on relation extraction in the last decade, statistical systems based on supervised learning are still limited, because they require large amounts of training data to achieve high performance level. In this article, we propose cross-lingual annotation projection methods that leverage parallel corpora to build a relation extraction system for a resource-poor language without significant annotation efforts. To make our method more reliable, we introduce two types of projection approaches with noise reduction strategies. We demonstrate the merit of our method using a Korean relation extraction system trained on projected examples from an English-Korean parallel corpus. Experiments show the feasibility of our approaches through comparison to other systems based on monolingual resources.
- Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94. Google ScholarDigital Library
- Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. 2007. Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2670--2676. Google ScholarDigital Library
- Brin, S. 1999. Extracting patterns and relations from the World Wide Web. In Proceedings of the International Workshop on the World Wide Web and Databases. Lecture Notes in Computer Science, vol. 590, 172--183. Google ScholarDigital Library
- Bunescu, R. and Mooney, R. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 724--731. Google ScholarDigital Library
- Bunescu, R. and Mooney, R. 2007. Learning to extract relations from the Web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Vol. 45, 576--583.Google Scholar
- Chen, J., Ji, D., Tan, C. L., and Niu, Z. 2006. Relation extraction using label propagation based semi-supervised learning. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 129--136. Google ScholarDigital Library
- Chen, Y., Zong, C., and Su, K. Y. 2010. On jointly recognizing and aligning bilingual named entities. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 631--639. Google ScholarDigital Library
- Chung, H. 2004. Statistical Korean dependency parsing model based on the surface contextual information. Ph.D. dissertation, Korea University.Google Scholar
- Chung, T., Post, M., and Gildea, D. 2010. Factors affecting the accuracy of Korean parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 49--57. Google ScholarDigital Library
- Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 423--429. Google ScholarDigital Library
- Das, D. and Petrov, S. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 600--609. Google ScholarDigital Library
- Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program tasks, data, and evaluation. In Proceedings of LREC. Vol. 4, 837--840.Google Scholar
- Fader, A., Soderland, S., and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1535--1545. Google ScholarDigital Library
- Fu, R., Qin, B., and Liu, T. 2011. Generating Chinese named entity data from a parallel corpus. In Proceedings of the 5th International Joint Conference on Natural Language Processing. 264--272.Google Scholar
- Grishman, R. and Sundheim, B. 1996. Message understanding conference-6: A brief history. In Proceedings of the 16th Conference on Computational Linguistics. Vol. 1, 466--471. Google ScholarDigital Library
- Hwa, R., Resnikm P., Weinberg, A., Cabezas, C., and Kolak, O. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Lang. Eng. 11, 3, 311--325. Google ScholarDigital Library
- Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the European Conference on Machine Learning. 137--142. Google ScholarDigital Library
- Kambhatla, N. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL Interactive Poster and Demonstration Sessions. 22--25. Google ScholarDigital Library
- Kim, H. 2006. Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium. 49--54.Google Scholar
- Kim, H., Seon, C.-N., and Seo, J. 2011. Review of Korean speech act classification: Machine learning methods. J. Comput. Sci. Eng. 5, 4, 288--293.Google ScholarCross Ref
- Ko, Y. and Seo, J. 2011. Issues and empirical results for improving text classification. J. Comput. Sci. Eng. 5, 2, 150--160.Google ScholarCross Ref
- Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Vol. 1, 48--54. Google ScholarDigital Library
- Lee, C., Hwang, Y.-G., and Jang, M.-G. 2007. Fine-grained named entity recognition and relation extraction for question answering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 799--800. Google ScholarDigital Library
- Lee, G. G., Cha, J., and Lee, J.-H. 2002. Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguistics 28, 1, 53--70. Google ScholarDigital Library
- Li, Q., Li, H., Ji, H., Wang, W., Zheng, J., and Huang, F. 2012. Joint bilingual name tagging for parallel corpora. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 1727--1731. Google ScholarDigital Library
- Merlo, P., Stevenson, S., Tsang, V., and Allaria, G. 2002. A multilingual paradigm for automatic verb classification. In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics. 207--214. Google ScholarDigital Library
- Moschitti, A. 2006. Making tree kernels practical for natural language learning. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 6, 113--120.Google Scholar
- Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguistics 29, 1, 19--51. Google ScholarDigital Library
- Pado, S. and Lapata, M. 2009. Cross-lingual annotation projection of semantic roles. J. Artif. Intell. Res. 36, 1, 307--340. Google ScholarDigital Library
- Riloff, E. and Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the National Conference on Artificial Intelligence. 474--479. Google ScholarDigital Library
- Wu, F. and Weld, D. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 118--127. Google ScholarDigital Library
- Yarowsky, D. and Ngai, G. 2001. Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics. 1--8. Google ScholarDigital Library
- Yarowsky, D., Ngai, G., and Wicentowski, R. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research. 1--8. Google ScholarDigital Library
- Zelenko, D., Aone, C., and Richardella, A. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083--1106. Google ScholarDigital Library
- Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 825--832. Google ScholarDigital Library
- Zhang, Z. 2004. Weakly-supervised relation classification for information extraction. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 581--588. Google ScholarDigital Library
- Zhou, G., Qian, L., and Zhu, Q. 2009. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities. Comput. Speech Lang. 23, 4, 464--478. Google ScholarDigital Library
- Zhou, G. D., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 427--434. Google ScholarDigital Library
- Zhu, X. and Ghahramani, Z. 2002. Learning from labeled and unlabeled data with label propagation. Tech. rep. CMU-CALD-02-107, School Computer Sciences, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
- Zitouni, I. and Florian, R. 2008. Mention detection crossing the language barrier. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 600--609. Google ScholarDigital Library
Index Terms
- Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction
Recommendations
Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning
WWW '18: Proceedings of the 2018 World Wide Web ConferenceExtracting relations from text corpora is an important task with wide applications. However, it becomes particularly challenging when focusing on weakly-supervised relation extraction, that is, utilizing a few relation instances (i.e., a pair of ...
Semi-supervised learning for relation extraction in Vietnamese text
SoICT '11: Proceedings of the 2nd Symposium on Information and Communication TechnologyRelation extraction (RE) is the task of finding semantic relations between entities from text. As the supervised learning method requires a large amount of labeled training data, the semi-supervised learning method is the topics of interest. This paper ...
Cross-Lingual Annotation Projection for Argument Mining in Portuguese
Progress in Artificial IntelligenceAbstractWhile Argument Mining has seen increasing success in monolingual settings, especially for the English language, other less-resourced languages are still lagging behind. In this paper, we build a Portuguese projected version of the Persuasive ...
Comments