Abstract
Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the lack of Chinese annotation corpora and the specificity of Chinese linguistics. Here, we summarize three kinds of unique but common phenomena in Chinese linguistics. In this article, we investigate unsupervised linguistics-based Chinese open relation extraction (ORE), which can automatically discover arbitrary relations without any manually labeled datasets, and research the establishment of a large-scale corpus. By mapping the entity relations into dependency-trees and considering the unique Chinese linguistic characteristics, we propose a novel unsupervised Chinese ORE model based on Dependency Semantic Normal Forms (DSNFs). This model imposes no restrictions on the relative positions among entities and relationships and achieves a high yield by extracting relations mediated by verbs or nouns and processing the parallel clauses. Empirical results from our model demonstrate the effectiveness of this method, which obtains stable performance on four heterogeneous datasets and achieves better precision and recall in comparison with several Chinese ORE systems. Furthermore, a large-scale knowledge base of entity and relation, called COER, is established and published by applying our method to web text, which conquers the trouble of lack of Chinese corpora.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Chinese Open Relation Extraction and Knowledge Base Establishment
- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 4825. 722--735. Google ScholarDigital Library
- Michele Banko, M. J. Cafarella, and Stephen Soderland. 2007. Open information extraction for the web. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI’07). 2670--2676. Google ScholarDigital Library
- Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1247--1250. Google ScholarDigital Library
- Danushka Tarupathi Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2010. Relational duality: Unsupervised extraction of semantic relations between entities on the web. In Proceedings of the International World Wide Web Conference (WWW’10). 151--160. Google ScholarDigital Library
- Miriam Butt. 2003. The light verb jungle. Harv. Work. Pap. Ling. 9, 1988 (2003), 1--49.Google Scholar
- Wanxiang Che, Jianmin Jiang, Zhong Su, Yue Pan, and Ting Liu. 2005. Improved-edit-distance kernel for chinese relation extraction. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 134--139.Google Scholar
- Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations (COLING’10). 13--16. Google ScholarDigital Library
- Yu Chen, Dequan Zheng, and Tiejun Zhao. 2012. Chinese relation extraction based on deep belief nets. J. Softw. 23, 10 (2012), 2572--2585.Google ScholarCross Ref
- Yanping Chen, Qinghua Zheng, and Ping Chen. 2015. Feature assembly method for extracting relations in chinese. Artif. Intell. 228 (2015), 179--194. Google ScholarDigital Library
- Nancy Chinchor and Elaine Marsh. 1998. MUC-7 information extraction task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7’98). 359--367.Google Scholar
- Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic role labeling for open information extraction. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2010 1st International Workshop on Formalisms and Methodology for Learning by Reading. 52--60. Google ScholarDigital Library
- Janara Christensen, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling categories and subject descriptors. In Proceedings of the 6th International Conference on Knowledge Capture (K-CAP’11). 113--119. Google ScholarDigital Library
- Luciano Del Corro and Rainer Gemulla. 2013. Clausie: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web. 355--366. Google ScholarDigital Library
- Cicero Nogueira dos Santos, Bing Xiang, and Bowen Zhou. 2015. Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53nd Annual Meeting on Association for Computational Linguistics (ACL’15). 626--634.Google ScholarCross Ref
- Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11), Vol. 1. 3--10. Google ScholarDigital Library
- Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545. Google ScholarDigital Library
- D Freitag. 2000. Machine learning for information extraction in informal domains. Mach. Learn. 39, 2-3 (2000), 169--202. Google ScholarDigital Library
- Lixin Gan, Changxuan Wan, Dexi Liu, and Jiang Tengjiao Zhong, Qing. 2016. Chinese named entity relation extraction based on syntactic and semantic features. J. Comput. Res. Dev. 53, 2 (2016), 284--302.Google Scholar
- Xiyue Guo, Tingting He, Xiaohua Hu, and Qianjun Chen. 2014. Chinese named entity relation extraction based on syntactic and semantic features. J. Chin. Inf. Process. 28, 6 (2014), 183--189.Google Scholar
- Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), Vol. 415. 415--422. Google ScholarDigital Library
- Chen Huang, Longhua Qin, Guodong Zhou, and Qiaoming Zhu. 2010. Research on unsupervised chinese entity relation extraction based on convolution tree kernel. J. Chin. Inf. Process. 24, 4 (2010), 11--17.Google Scholar
- Nanda Kambhatla. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). 22. Google ScholarDigital Library
- Johannes Kirschnick, Holmer Hemsen, and Volker Markl. 2016. JEDI : Joint entity and relation detection using type inference. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). 61--66.Google ScholarCross Ref
- Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-scale learning of relation-extraction rules with distant supervision from the web. In Proceedings of the 11th International Conference on the Semantic Web (ISWC’12), Vol. 1. 263--278. Google ScholarDigital Library
- Wenjie Li, Peng Zhang, Furu Wei, Yuexian Hou, and Qin Lu. 2008. A novel feature-based approach to chinese entity relation extraction. In Proceedings of the 46nd Annual Meeting of the Association for Computational Linguistics (ACL’08). 89--92. Google ScholarDigital Library
- Ruqi Lin, Jinxiu Chen, Xiaofang Yang, and Honglei Xu. 2010. Research on mixed model-based chinese relation extraction. In Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT’10), Vol. 1. 687--691.Google Scholar
- Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL’16). 2124--2133.Google ScholarCross Ref
- Dandan Liu, Zhiwei Zhao, Yanan Hu, and Longhua Qian. 2013. Incorporating lexical semantic similarity to tree kernel-based chinese relation extraction. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 7717. 11--21. Google ScholarDigital Library
- Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. 2012. Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 523--534. Google ScholarDigital Library
- Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and efficiency of open relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 447--457.Google Scholar
- Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing Associations. 1003--1011. Google ScholarDigital Library
- Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL’16). arxiv:1601.0770Google ScholarCross Ref
- Andrea Moro and Roberto Navigli. 2013. Integrating syntactic and semantic analysis into the open information extraction paradigm. In Proceedings of the 22th International Joint Conference on Artificial Intelligence (IJCAI’13). 2148--2154. Google ScholarDigital Library
- Ndapandula Nakashole, Gerhard Weikum, and Fabian M. Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types.. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’12). 1135--1145. Google ScholarDigital Library
- Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The importance of syntactic parsing and inference in semantic role labeling. Comput. Ling. 34, May 2007 (2008), 257--287. Google ScholarDigital Library
- Bing Qin, An’an Liu, and Ting Liu. 2015. Unsupervised chinese open entity relation extraction. J. Comput. Res. Dev. 52, 5 (2015), 1029--1035.Google Scholar
- Likun Qiu and Yue Zhang. 2014. ZORE : A syntax-based system for chinese open relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1870--1880.Google ScholarCross Ref
- Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO: A core of semantic knowledge unifyingwordnet and wikipedia fabian. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 697. Google ScholarDigital Library
- Yuen-hsien Tseng, Lung-hao Lee, Shu-yen Lin, Bo-shun Liao, Mei-jun Liu, Hsin-hsi Chen, Oren Etzioni, and Anthony Fader. 2014. Chinese open relation extraction for knowledge acquisition. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL’14). 12--16.Google Scholar
- Jing Wang. 2012. Research on Unsupervised Chinese Entity Relation Extraction Method. Ph.D. thesis.Google Scholar
- Jing Wang, Jing Yang, Liang He, Xin Lin, Chao Chen, and Tianlong Ma. 2011. Chinese entity relation extraction based on word cooccurrence. Energy Proc. 13 (2011), 8048--8055.Google Scholar
- Fei Wu and Daniel S. Weld. 2010. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127. Google ScholarDigital Library
- Yan Xu, Lili Mou, Ge Li, and Yunchuan Chen. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1785--1794.Google ScholarCross Ref
- Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarCross Ref
- Ji Zhang, You Ouyang, Wenjie Li, and Yuexian Hou. 2009. A novel composite kernel approach to chinese entity relation extraction. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy (ICCPOL’09), Vol. 5459. 236--247. Google ScholarDigital Library
- Peng Zhang, Wenjie Li, Furu Wei, Qin Lu, and Yuexian Hou. 2008. Exploiting the role of position feature in chinese relation extraction. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08). 2120--2124.Google Scholar
- Y. Zhang and J. F. Zhou. 2000. A trainable method for extracting chinese entity names and their relations. In Proceedings of the 2nd Chinese Language Processing Workshop. 66--72. Google ScholarDigital Library
- Shanshan Zheng. 2013. Entity Relation Extraction Based on Chinese Grammar in Open Area. Ph.D. Dissertation.Google Scholar
Index Terms
- Chinese Open Relation Extraction and Knowledge Base Establishment
Recommendations
Dependency Parsing-based Entity Relation Extraction over Chinese Complex Text
Open Relation Extraction (ORE) plays a significant role in the field of Information Extraction. It breaks the limitation that traditional relation extraction must pre-define relational types in the annotated corpus and specific domains restrictions, to ...
Incorporating lexical semantic similarity to tree kernel-based chinese relation extraction
CLSW'12: Proceedings of the 13th Chinese conference on Chinese Lexical SemanticsLexical semantic information plays an important role in semantic relation extraction between named entities. This paper incorporates two kinds of lexical semantic similarity measures, thesaurus-based and corpus-based, into convolution tree kernels and ...
Extracting Chinese Domain-specific Open Entity and Relation by Using Learning Patterns
ACM TURC '20: Proceedings of the ACM Turing Celebration Conference - ChinaNowadays, Chinese domain-specific relation extraction faces a major challenge, that is the lack of annotation data. To cope with this challenge, the distant supervision which can automatically label large-scale training data was proposed. However, the ...
Comments