skip to main content
research-article

Chinese Open Relation Extraction and Knowledge Base Establishment

Published:14 February 2018Publication History
Skip Abstract Section

Abstract

Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the lack of Chinese annotation corpora and the specificity of Chinese linguistics. Here, we summarize three kinds of unique but common phenomena in Chinese linguistics. In this article, we investigate unsupervised linguistics-based Chinese open relation extraction (ORE), which can automatically discover arbitrary relations without any manually labeled datasets, and research the establishment of a large-scale corpus. By mapping the entity relations into dependency-trees and considering the unique Chinese linguistic characteristics, we propose a novel unsupervised Chinese ORE model based on Dependency Semantic Normal Forms (DSNFs). This model imposes no restrictions on the relative positions among entities and relationships and achieves a high yield by extracting relations mediated by verbs or nouns and processing the parallel clauses. Empirical results from our model demonstrate the effectiveness of this method, which obtains stable performance on four heterogeneous datasets and achieves better precision and recall in comparison with several Chinese ORE systems. Furthermore, a large-scale knowledge base of entity and relation, called COER, is established and published by applying our method to web text, which conquers the trouble of lack of Chinese corpora.

Skip Supplemental Material Section

Supplemental Material

References

  1. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 4825. 722--735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michele Banko, M. J. Cafarella, and Stephen Soderland. 2007. Open information extraction for the web. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI’07). 2670--2676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1247--1250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Danushka Tarupathi Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2010. Relational duality: Unsupervised extraction of semantic relations between entities on the web. In Proceedings of the International World Wide Web Conference (WWW’10). 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Miriam Butt. 2003. The light verb jungle. Harv. Work. Pap. Ling. 9, 1988 (2003), 1--49.Google ScholarGoogle Scholar
  6. Wanxiang Che, Jianmin Jiang, Zhong Su, Yue Pan, and Ting Liu. 2005. Improved-edit-distance kernel for chinese relation extraction. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 134--139.Google ScholarGoogle Scholar
  7. Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations (COLING’10). 13--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yu Chen, Dequan Zheng, and Tiejun Zhao. 2012. Chinese relation extraction based on deep belief nets. J. Softw. 23, 10 (2012), 2572--2585.Google ScholarGoogle ScholarCross RefCross Ref
  9. Yanping Chen, Qinghua Zheng, and Ping Chen. 2015. Feature assembly method for extracting relations in chinese. Artif. Intell. 228 (2015), 179--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nancy Chinchor and Elaine Marsh. 1998. MUC-7 information extraction task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7’98). 359--367.Google ScholarGoogle Scholar
  11. Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic role labeling for open information extraction. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2010 1st International Workshop on Formalisms and Methodology for Learning by Reading. 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Janara Christensen, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling categories and subject descriptors. In Proceedings of the 6th International Conference on Knowledge Capture (K-CAP’11). 113--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Luciano Del Corro and Rainer Gemulla. 2013. Clausie: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web. 355--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cicero Nogueira dos Santos, Bing Xiang, and Bowen Zhou. 2015. Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53nd Annual Meeting on Association for Computational Linguistics (ACL’15). 626--634.Google ScholarGoogle ScholarCross RefCross Ref
  15. Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11), Vol. 1. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D Freitag. 2000. Machine learning for information extraction in informal domains. Mach. Learn. 39, 2-3 (2000), 169--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lixin Gan, Changxuan Wan, Dexi Liu, and Jiang Tengjiao Zhong, Qing. 2016. Chinese named entity relation extraction based on syntactic and semantic features. J. Comput. Res. Dev. 53, 2 (2016), 284--302.Google ScholarGoogle Scholar
  19. Xiyue Guo, Tingting He, Xiaohua Hu, and Qianjun Chen. 2014. Chinese named entity relation extraction based on syntactic and semantic features. J. Chin. Inf. Process. 28, 6 (2014), 183--189.Google ScholarGoogle Scholar
  20. Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), Vol. 415. 415--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chen Huang, Longhua Qin, Guodong Zhou, and Qiaoming Zhu. 2010. Research on unsupervised chinese entity relation extraction based on convolution tree kernel. J. Chin. Inf. Process. 24, 4 (2010), 11--17.Google ScholarGoogle Scholar
  22. Nanda Kambhatla. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Johannes Kirschnick, Holmer Hemsen, and Volker Markl. 2016. JEDI : Joint entity and relation detection using type inference. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). 61--66.Google ScholarGoogle ScholarCross RefCross Ref
  24. Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-scale learning of relation-extraction rules with distant supervision from the web. In Proceedings of the 11th International Conference on the Semantic Web (ISWC’12), Vol. 1. 263--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wenjie Li, Peng Zhang, Furu Wei, Yuexian Hou, and Qin Lu. 2008. A novel feature-based approach to chinese entity relation extraction. In Proceedings of the 46nd Annual Meeting of the Association for Computational Linguistics (ACL’08). 89--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ruqi Lin, Jinxiu Chen, Xiaofang Yang, and Honglei Xu. 2010. Research on mixed model-based chinese relation extraction. In Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT’10), Vol. 1. 687--691.Google ScholarGoogle Scholar
  27. Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL’16). 2124--2133.Google ScholarGoogle ScholarCross RefCross Ref
  28. Dandan Liu, Zhiwei Zhao, Yanan Hu, and Longhua Qian. 2013. Incorporating lexical semantic similarity to tree kernel-based chinese relation extraction. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 7717. 11--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. 2012. Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 523--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and efficiency of open relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 447--457.Google ScholarGoogle Scholar
  31. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing Associations. 1003--1011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL’16). arxiv:1601.0770Google ScholarGoogle ScholarCross RefCross Ref
  33. Andrea Moro and Roberto Navigli. 2013. Integrating syntactic and semantic analysis into the open information extraction paradigm. In Proceedings of the 22th International Joint Conference on Artificial Intelligence (IJCAI’13). 2148--2154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ndapandula Nakashole, Gerhard Weikum, and Fabian M. Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types.. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’12). 1135--1145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The importance of syntactic parsing and inference in semantic role labeling. Comput. Ling. 34, May 2007 (2008), 257--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bing Qin, An’an Liu, and Ting Liu. 2015. Unsupervised chinese open entity relation extraction. J. Comput. Res. Dev. 52, 5 (2015), 1029--1035.Google ScholarGoogle Scholar
  37. Likun Qiu and Yue Zhang. 2014. ZORE : A syntax-based system for chinese open relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1870--1880.Google ScholarGoogle ScholarCross RefCross Ref
  38. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO: A core of semantic knowledge unifyingwordnet and wikipedia fabian. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 697. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yuen-hsien Tseng, Lung-hao Lee, Shu-yen Lin, Bo-shun Liao, Mei-jun Liu, Hsin-hsi Chen, Oren Etzioni, and Anthony Fader. 2014. Chinese open relation extraction for knowledge acquisition. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL’14). 12--16.Google ScholarGoogle Scholar
  40. Jing Wang. 2012. Research on Unsupervised Chinese Entity Relation Extraction Method. Ph.D. thesis.Google ScholarGoogle Scholar
  41. Jing Wang, Jing Yang, Liang He, Xin Lin, Chao Chen, and Tianlong Ma. 2011. Chinese entity relation extraction based on word cooccurrence. Energy Proc. 13 (2011), 8048--8055.Google ScholarGoogle Scholar
  42. Fei Wu and Daniel S. Weld. 2010. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yan Xu, Lili Mou, Ge Li, and Yunchuan Chen. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1785--1794.Google ScholarGoogle ScholarCross RefCross Ref
  44. Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarGoogle ScholarCross RefCross Ref
  45. Ji Zhang, You Ouyang, Wenjie Li, and Yuexian Hou. 2009. A novel composite kernel approach to chinese entity relation extraction. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy (ICCPOL’09), Vol. 5459. 236--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Peng Zhang, Wenjie Li, Furu Wei, Qin Lu, and Yuexian Hou. 2008. Exploiting the role of position feature in chinese relation extraction. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08). 2120--2124.Google ScholarGoogle Scholar
  47. Y. Zhang and J. F. Zhou. 2000. A trainable method for extracting chinese entity names and their relations. In Proceedings of the 2nd Chinese Language Processing Workshop. 66--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shanshan Zheng. 2013. Entity Relation Extraction Based on Chinese Grammar in Open Area. Ph.D. Dissertation.Google ScholarGoogle Scholar

Index Terms

  1. Chinese Open Relation Extraction and Knowledge Base Establishment

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
            September 2018
            196 pages
            ISSN:2375-4699
            EISSN:2375-4702
            DOI:10.1145/3184403
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 February 2018
            • Accepted: 1 November 2017
            • Revised: 1 July 2017
            • Received: 1 April 2017
            Published in tallip Volume 17, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader