skip to main content
research-article

Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

Published:01 February 2014Publication History
Skip Abstract Section

Abstract

Although researchers have conducted extensive studies on relation extraction in the last decade, statistical systems based on supervised learning are still limited, because they require large amounts of training data to achieve high performance level. In this article, we propose cross-lingual annotation projection methods that leverage parallel corpora to build a relation extraction system for a resource-poor language without significant annotation efforts. To make our method more reliable, we introduce two types of projection approaches with noise reduction strategies. We demonstrate the merit of our method using a Korean relation extraction system trained on projected examples from an English-Korean parallel corpus. Experiments show the feasibility of our approaches through comparison to other systems based on monolingual resources.

References

  1. Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni, O. 2007. Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. 2670--2676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brin, S. 1999. Extracting patterns and relations from the World Wide Web. In Proceedings of the International Workshop on the World Wide Web and Databases. Lecture Notes in Computer Science, vol. 590, 172--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bunescu, R. and Mooney, R. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 724--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bunescu, R. and Mooney, R. 2007. Learning to extract relations from the Web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Vol. 45, 576--583.Google ScholarGoogle Scholar
  6. Chen, J., Ji, D., Tan, C. L., and Niu, Z. 2006. Relation extraction using label propagation based semi-supervised learning. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, Y., Zong, C., and Su, K. Y. 2010. On jointly recognizing and aligning bilingual named entities. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 631--639. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chung, H. 2004. Statistical Korean dependency parsing model based on the surface contextual information. Ph.D. dissertation, Korea University.Google ScholarGoogle Scholar
  9. Chung, T., Post, M., and Gildea, D. 2010. Factors affecting the accuracy of Korean parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 49--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 423--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Das, D. and Petrov, S. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 600--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., and Weischedel, R. 2004. The automatic content extraction (ACE) program tasks, data, and evaluation. In Proceedings of LREC. Vol. 4, 837--840.Google ScholarGoogle Scholar
  13. Fader, A., Soderland, S., and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1535--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fu, R., Qin, B., and Liu, T. 2011. Generating Chinese named entity data from a parallel corpus. In Proceedings of the 5th International Joint Conference on Natural Language Processing. 264--272.Google ScholarGoogle Scholar
  15. Grishman, R. and Sundheim, B. 1996. Message understanding conference-6: A brief history. In Proceedings of the 16th Conference on Computational Linguistics. Vol. 1, 466--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hwa, R., Resnikm P., Weinberg, A., Cabezas, C., and Kolak, O. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Lang. Eng. 11, 3, 311--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the European Conference on Machine Learning. 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kambhatla, N. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL Interactive Poster and Demonstration Sessions. 22--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kim, H. 2006. Korean national corpus in the 21st century Sejong project. In Proceedings of the 13th NIJL International Symposium. 49--54.Google ScholarGoogle Scholar
  20. Kim, H., Seon, C.-N., and Seo, J. 2011. Review of Korean speech act classification: Machine learning methods. J. Comput. Sci. Eng. 5, 4, 288--293.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ko, Y. and Seo, J. 2011. Issues and empirical results for improving text classification. J. Comput. Sci. Eng. 5, 2, 150--160.Google ScholarGoogle ScholarCross RefCross Ref
  22. Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Vol. 1, 48--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lee, C., Hwang, Y.-G., and Jang, M.-G. 2007. Fine-grained named entity recognition and relation extraction for question answering. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 799--800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lee, G. G., Cha, J., and Lee, J.-H. 2002. Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguistics 28, 1, 53--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Li, Q., Li, H., Ji, H., Wang, W., Zheng, J., and Huang, F. 2012. Joint bilingual name tagging for parallel corpora. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 1727--1731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Merlo, P., Stevenson, S., Tsang, V., and Allaria, G. 2002. A multilingual paradigm for automatic verb classification. In Proceedings of the 40th Annual Meeting of Association for Computational Linguistics. 207--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Moschitti, A. 2006. Making tree kernels practical for natural language learning. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 6, 113--120.Google ScholarGoogle Scholar
  28. Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguistics 29, 1, 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pado, S. and Lapata, M. 2009. Cross-lingual annotation projection of semantic roles. J. Artif. Intell. Res. 36, 1, 307--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Riloff, E. and Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the National Conference on Artificial Intelligence. 474--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wu, F. and Weld, D. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yarowsky, D. and Ngai, G. 2001. Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yarowsky, D., Ngai, G., and Wicentowski, R. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zelenko, D., Aone, C., and Richardella, A. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 825--832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhang, Z. 2004. Weakly-supervised relation classification for information extraction. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 581--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhou, G., Qian, L., and Zhu, Q. 2009. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities. Comput. Speech Lang. 23, 4, 464--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhou, G. D., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 427--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhu, X. and Ghahramani, Z. 2002. Learning from labeled and unlabeled data with label propagation. Tech. rep. CMU-CALD-02-107, School Computer Sciences, Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  40. Zitouni, I. and Florian, R. 2008. Mention detection crossing the language barrier. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 600--609. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 13, Issue 1
      February 2014
      93 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2590408
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2014
      • Accepted: 1 September 2013
      • Revised: 1 August 2013
      • Received: 1 November 2012
      Published in talip Volume 13, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader