research-article

Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

Authors:
Seokhwan Kim

Institute for InfoComm Research

Institute for InfoComm Research
View Profile

,
Minwoo Jeong

Microsoft Bing

Microsoft Bing
View Profile

,
Jonghoon Lee

Pohang University of Science and Technology

Pohang University of Science and Technology
View Profile

,
Gary Geunbae Lee

Pohang University of Science and Technology

Pohang University of Science and Technology
View Profile

ACM Transactions on Asian Language Information Processing Volume 13 Issue 1Article No.: 3pp 1–26https://doi.org/10.1145/2529994

Published:01 February 2014Publication History

ACM Transactions on Asian Language Information Processing

Abstract

Although researchers have conducted extensive studies on relation extraction in the last decade, statistical systems based on supervised learning are still limited, because they require large amounts of training data to achieve high performance level. In this article, we propose cross-lingual annotation projection methods that leverage parallel corpora to build a relation extraction system for a resource-poor language without significant annotation efforts. To make our method more reliable, we introduce two types of projection approaches with noise reduction strategies. We demonstrate the merit of our method using a Korean relation extraction system trained on projected examples from an English-Korean parallel corpus. Experiments show the feasibility of our approaches through comparison to other systems based on monolingual resources.

References

Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94. Google ScholarDigital Library
Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. 2007. Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2670--2676. Google ScholarDigital Library
Brin, S. 1999. Extracting patterns and relations from the World Wide Web. In Proceedings of the International Workshop on the World Wide Web and Databases. Lecture Notes in Computer Science, vol. 590, 172--183. Google ScholarDigital Library
Bunescu, R. and Mooney, R. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 724--731. Google ScholarDigital Library
Bunescu, R. and Mooney, R. 2007. Learning to extract relations from the Web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Vol. 45, 576--583.Google Scholar
Chen, J., Ji, D., Tan, C. L., and Niu, Z. 2006. Relation extraction using label propagation based semi-supervised learning. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 129--136. Google ScholarDigital Library
Chen, Y., Zong, C., and Su, K. Y. 2010. On jointly recognizing and aligning bilingual named entities. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 631--639. Google ScholarDigital Library
Chung, H. 2004. Statistical Korean dependency parsing model based on the surface contextual information. Ph.D. dissertation, Korea University.Google Scholar
Chung, T., Post, M., and Gildea, D. 2010. Factors affecting the accuracy of Korean parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 49--57. Google ScholarDigital Library
Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 423--429. Google ScholarDigital Library
Das, D. and Petrov, S. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 600--609. Google ScholarDigital Library
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program tasks, data, and evaluation. In Proceedings of LREC. Vol. 4, 837--840.Google Scholar
Fader, A., Soderland, S., and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1535--1545. Google ScholarDigital Library
Fu, R., Qin, B., and Liu, T. 2011. Generating Chinese named entity data from a parallel corpus. In Proceedings of the 5th International Joint Conference on Natural Language Processing. 264--272.Google Scholar
Grishman, R. and Sundheim, B. 1996. Message understanding conference-6: A brief history. In Proceedings of the 16th Conference on Computational Linguistics. Vol. 1, 466--471. Google ScholarDigital Library
Hwa, R., Resnikm P., Weinberg, A., Cabezas, C., and Kolak, O. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Lang. Eng. 11, 3, 311--325. Google ScholarDigital Library
Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the European Conference on Machine Learning. 137--142. Google ScholarDigital Library
Kambhatla, N. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL Interactive Poster and Demonstration Sessions. 22--25. Google ScholarDigital Library
Kim, H. 2006. Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium. 49--54.Google Scholar
Kim, H., Seon, C.-N., and Seo, J. 2011. Review of Korean speech act classification: Machine learning methods. J. Comput. Sci. Eng. 5, 4, 288--293.Google ScholarCross Ref
Ko, Y. and Seo, J. 2011. Issues and empirical results for improving text classification. J. Comput. Sci. Eng. 5, 2, 150--160.Google ScholarCross Ref
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Vol. 1, 48--54. Google ScholarDigital Library
Lee, C., Hwang, Y.-G., and Jang, M.-G. 2007. Fine-grained named entity recognition and relation extraction for question answering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 799--800. Google ScholarDigital Library
Lee, G. G., Cha, J., and Lee, J.-H. 2002. Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguistics 28, 1, 53--70. Google ScholarDigital Library
Li, Q., Li, H., Ji, H., Wang, W., Zheng, J., and Huang, F. 2012. Joint bilingual name tagging for parallel corpora. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 1727--1731. Google ScholarDigital Library
Merlo, P., Stevenson, S., Tsang, V., and Allaria, G. 2002. A multilingual paradigm for automatic verb classification. In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics. 207--214. Google ScholarDigital Library
Moschitti, A. 2006. Making tree kernels practical for natural language learning. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 6, 113--120.Google Scholar
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguistics 29, 1, 19--51. Google ScholarDigital Library
Pado, S. and Lapata, M. 2009. Cross-lingual annotation projection of semantic roles. J. Artif. Intell. Res. 36, 1, 307--340. Google ScholarDigital Library
Riloff, E. and Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the National Conference on Artificial Intelligence. 474--479. Google ScholarDigital Library
Wu, F. and Weld, D. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 118--127. Google ScholarDigital Library
Yarowsky, D. and Ngai, G. 2001. Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics. 1--8. Google ScholarDigital Library
Yarowsky, D., Ngai, G., and Wicentowski, R. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research. 1--8. Google ScholarDigital Library
Zelenko, D., Aone, C., and Richardella, A. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083--1106. Google ScholarDigital Library
Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 825--832. Google ScholarDigital Library
Zhang, Z. 2004. Weakly-supervised relation classification for information extraction. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 581--588. Google ScholarDigital Library
Zhou, G., Qian, L., and Zhu, Q. 2009. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities. Comput. Speech Lang. 23, 4, 464--478. Google ScholarDigital Library
Zhou, G. D., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 427--434. Google ScholarDigital Library
Zhu, X. and Ghahramani, Z. 2002. Learning from labeled and unlabeled data with label propagation. Tech. rep. CMU-CALD-02-107, School Computer Sciences, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
Zitouni, I. and Florian, R. 2008. Mention detection crossing the language barrier. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 600--609. Google ScholarDigital Library

Index Terms

Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning
WWW '18: Proceedings of the 2018 World Wide Web Conference

Extracting relations from text corpora is an important task with wide applications. However, it becomes particularly challenging when focusing on weakly-supervised relation extraction, that is, utilizing a few relation instances (i.e., a pair of ...
Read More
Semi-supervised learning for relation extraction in Vietnamese text
SoICT '11: Proceedings of the 2nd Symposium on Information and Communication Technology

Relation extraction (RE) is the task of finding semantic relations between entities from text. As the supervised learning method requires a large amount of labeled training data, the semi-supervised learning method is the topics of interest. This paper ...
Read More
Cross-Lingual Annotation Projection for Argument Mining in Portuguese
Progress in Artificial Intelligence
Abstract
While Argument Mining has seen increasing success in monolingual settings, especially for the English language, other less-resourced languages are still lagging behind. In this paper, we build a Portuguese projected version of the Persuasive ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian Language Information Processing Volume 13, Issue 1
February 2014
93 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/2590408
Editor:
Richard Sproat
Google, Inc., USA
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2014
- Accepted: 1 September 2013
- Revised: 1 August 2013
- Received: 1 November 2012
Published in talip Volume 13, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Relation extraction
cross-lingual annotation projection
weakly-supervised learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 295
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning

Semi-supervised learning for relation extraction in Vietnamese text

Cross-Lingual Annotation Projection for Argument Mining in Portuguese

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning

Semi-supervised learning for relation extraction in Vietnamese text

Cross-Lingual Annotation Projection for Argument Mining in Portuguese

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media