Article

Detecting, categorizing and clustering entity mentions in Chinese text

Authors:
Wenjie Li

The Hong Kong Polytechnic University, Hong Kong, Hong Kong

The Hong Kong Polytechnic University, Hong Kong, Hong Kong
View Profile

,
Donglei Qian

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Qin Lu

The Hong Kong Polytechnic University, Hong Kong, Hong Kong

The Hong Kong Polytechnic University, Hong Kong, Hong Kong
View Profile

,
Chunfa Yuan

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 647–654https://doi.org/10.1145/1277741.1277852

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 647–654

ABSTRACT

The work presented in this paper is motivated by the practical need for content extraction, and the available data source and evaluation benchmark from the ACE program. The Chinese Entity Detection and Recognition (EDR) task is of particular interest to us. This task presents us several language-independent and language-dependent challenges, e.g. rising from the complication of extraction targets and the problem of word segmentation, etc. In this paper, we propose a novel solution to alleviate the problems special in the task. Mention detection takes advantages of machine learning approaches and character-based models. It manipulates different types of entities being mentioned and different constitution units (i.e. extents and heads) separately. Mentions referring to the same entity are linked together by integrating most-specific-first and closest-first rule based pairwise clustering algorithms. Types of mentions and entities are determined by head-driven classification approaches. The implemented system achieves ACE value of 66.1 when evaluated on the EDR 2005 Chinese corpus, which has been one of the top-tier results. Alternative approaches to mention detection and clustering are also discussed and analyzed.

References

W. Chen, Y. Zhang, and H. Isahra. Chinese named entity recognition with conditional random fields. In proceedings of SIGHAN, pages 118--121, 2006.Google Scholar
R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, and S. Roukos. A statistical model for multilingual entity detection and tracking. In proceedings of HLT/NAACL, pages 1--8, 2004.Google ScholarCross Ref
H. Guo, J. Jiang, G. Hu, and T. Zhang. Chinese named entity recognition based on multilevel linguistics features. In proceedings of IJCNLP, pages 90--99, 2005. Google ScholarDigital Library
H. Isozaki and H. Kazawa. Efficient support vector classifiers for named entity recognition. In proceedings of IJCNLP, pages 1--7, 2002. Google ScholarDigital Library
A. Ittycheriah, L. Lita, N. Kambhatla, N. Nicolov, S. Roukos, and M. Stys. Identifying and tracking entity mentions in a maximum entropy framework. In proceedings of HLT/NAACL, pages 40--42, 2003. Google ScholarDigital Library
H. Jing, R. Florian, X. Luo, T. Zhang, and A. Ittycheriah. Howtogetachinesename (entity): Segmentation and combination issues. In proceedings of EMNLP, pages 200--207, 2003. Google ScholarDigital Library
X. Luo, A. Ittycheriah, H. Jing, N.Kambhatla, and S. Roukos. A mention-synchronous coreference resolution algorithm based on the bell tree. In proceedings of ACL, pages 136--143, 2004. Google ScholarDigital Library
V. Ng. Machine learning for coreference resolution: From local classification to global ranking. In proceedings of ACL, pages 157--164, 2005. Google ScholarDigital Library
V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. In proceedings of ACL, pages 104--111, 2002. Google ScholarDigital Library
D. Qian, W. Li, C. Yuan, Q. Lu, and M. Wu. Applying machine learning to chinese named entity detection and tracking. In proceedings of CiCling, pages 154--165, 2007. Google ScholarDigital Library
W. Soon, H. Ng, and C. Lim. Machine learning approach to coreference resolution of noun phrases. In proceedings of Computational Linguistics, pages 521--544, 2001.Google ScholarCross Ref
X. Yang, J. Su, G. Zhou, and C. Tan. An np-cluster based approach to coreference resolution. In proceedings of COCLING, pages 23--27, 2004. Google ScholarDigital Library
G. Zhou and J. Su. Named entity recognition using an hmm-based chunk tagge. In proceedings of ACL, pages 473--480, 2002. Google ScholarDigital Library
Y. Zhou, C. Huang, J. Gao, and L. Wu. Transformation based chinese entity detection and tracking. In proceedings of IJCNLP, pages 232--237, 2005.Google Scholar

Index Terms

Detecting, categorizing and clustering entity mentions in Chinese text
1. Applied computing
  1. Document management and text processing

Recommendations

Semi-automatic Annotation for Mentions in Hindi Text
Abstract
Annotated corpora are required for the development of modern, accurate, and robust techniques for Natural Language Processing (NLP) downstream applications. The annotated data contain additional information which is required to train the system in ...
Read More
Improving named entity recognition and disambiguation in news headlines

In this paper, we present a framework for extraction and disambiguation of hyphenated and partially named entities in news headlines. The direct application of state-of-the-art named entity detection and disambiguation approaches on news headlines results ...
Read More
Identifying non-elliptical entity mentions in a coordinated NP with ellipses

Display Omitted Our NER method resolves simple and complex ellipses in coordinated NPs.We presented two formal notations to express syntactic relationships between words.We model the process of making non-elliptical entity mentions into a coordinated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity mentions in Chinese
mention categorization and mention clustering
mention detection
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 849
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detecting, categorizing and clustering entity mentions in Chinese text

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semi-automatic Annotation for Mentions in Hindi Text

Improving named entity recognition and disambiguation in news headlines

Identifying non-elliptical entity mentions in a coordinated NP with ellipses