ABSTRACT
The work presented in this paper is motivated by the practical need for content extraction, and the available data source and evaluation benchmark from the ACE program. The Chinese Entity Detection and Recognition (EDR) task is of particular interest to us. This task presents us several language-independent and language-dependent challenges, e.g. rising from the complication of extraction targets and the problem of word segmentation, etc. In this paper, we propose a novel solution to alleviate the problems special in the task. Mention detection takes advantages of machine learning approaches and character-based models. It manipulates different types of entities being mentioned and different constitution units (i.e. extents and heads) separately. Mentions referring to the same entity are linked together by integrating most-specific-first and closest-first rule based pairwise clustering algorithms. Types of mentions and entities are determined by head-driven classification approaches. The implemented system achieves ACE value of 66.1 when evaluated on the EDR 2005 Chinese corpus, which has been one of the top-tier results. Alternative approaches to mention detection and clustering are also discussed and analyzed.
- W. Chen, Y. Zhang, and H. Isahra. Chinese named entity recognition with conditional random fields. In proceedings of SIGHAN, pages 118--121, 2006.Google Scholar
- R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, and S. Roukos. A statistical model for multilingual entity detection and tracking. In proceedings of HLT/NAACL, pages 1--8, 2004.Google ScholarCross Ref
- H. Guo, J. Jiang, G. Hu, and T. Zhang. Chinese named entity recognition based on multilevel linguistics features. In proceedings of IJCNLP, pages 90--99, 2005. Google ScholarDigital Library
- H. Isozaki and H. Kazawa. Efficient support vector classifiers for named entity recognition. In proceedings of IJCNLP, pages 1--7, 2002. Google ScholarDigital Library
- A. Ittycheriah, L. Lita, N. Kambhatla, N. Nicolov, S. Roukos, and M. Stys. Identifying and tracking entity mentions in a maximum entropy framework. In proceedings of HLT/NAACL, pages 40--42, 2003. Google ScholarDigital Library
- H. Jing, R. Florian, X. Luo, T. Zhang, and A. Ittycheriah. Howtogetachinesename (entity): Segmentation and combination issues. In proceedings of EMNLP, pages 200--207, 2003. Google ScholarDigital Library
- X. Luo, A. Ittycheriah, H. Jing, N.Kambhatla, and S. Roukos. A mention-synchronous coreference resolution algorithm based on the bell tree. In proceedings of ACL, pages 136--143, 2004. Google ScholarDigital Library
- V. Ng. Machine learning for coreference resolution: From local classification to global ranking. In proceedings of ACL, pages 157--164, 2005. Google ScholarDigital Library
- V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In proceedings of ACL, pages 104--111, 2002. Google ScholarDigital Library
- D. Qian, W. Li, C. Yuan, Q. Lu, and M. Wu. Applying machine learning to chinese named entity detection and tracking. In proceedings of CiCling, pages 154--165, 2007. Google ScholarDigital Library
- W. Soon, H. Ng, and C. Lim. Machine learning approach to coreference resolution of noun phrases. In proceedings of Computational Linguistics, pages 521--544, 2001.Google ScholarCross Ref
- X. Yang, J. Su, G. Zhou, and C. Tan. An np-cluster based approach to coreference resolution. In proceedings of COCLING, pages 23--27, 2004. Google ScholarDigital Library
- G. Zhou and J. Su. Named entity recognition using an hmm-based chunk tagge. In proceedings of ACL, pages 473--480, 2002. Google ScholarDigital Library
- Y. Zhou, C. Huang, J. Gao, and L. Wu. Transformation based chinese entity detection and tracking. In proceedings of IJCNLP, pages 232--237, 2005.Google Scholar
Index Terms
- Detecting, categorizing and clustering entity mentions in Chinese text
Recommendations
Semi-automatic Annotation for Mentions in Hindi Text
AbstractAnnotated corpora are required for the development of modern, accurate, and robust techniques for Natural Language Processing (NLP) downstream applications. The annotated data contain additional information which is required to train the system in ...
Improving named entity recognition and disambiguation in news headlines
In this paper, we present a framework for extraction and disambiguation of hyphenated and partially named entities in news headlines. The direct application of state-of-the-art named entity detection and disambiguation approaches on news headlines results ...
Identifying non-elliptical entity mentions in a coordinated NP with ellipses
Display Omitted Our NER method resolves simple and complex ellipses in coordinated NPs.We presented two formal notations to express syntactic relationships between words.We model the process of making non-elliptical entity mentions into a coordinated ...
Comments