research-article

Free Access

Chinese term extraction using minimal resources

Authors:
Yuhang Yang

Harbin Institute of Technology, Harbin, China

Harbin Institute of Technology, Harbin, China
View Profile

,
Qin Lu

The Hong Kong Polytechnic University, Hong Kong, China

The Hong Kong Polytechnic University, Hong Kong, China
View Profile

,
Tiejun Zhao

Harbin Institute of Technology, Harbin, China

Harbin Institute of Technology, Harbin, China
View Profile

COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1August 2008Pages 1033–1040

Published:18 August 2008Publication History

COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

Pages 1033–1040

ABSTRACT

This paper presents a new approach for term extraction using minimal resources. A term candidate extraction algorithm is proposed to identify features of the relatively stable and domain independent term delimiters rather than that of the terms. For term verification, a link analysis based method is proposed to calculate the relevance between term candidates and the sentences in the domain specific corpus from which the candidates are extracted. The proposed approach requires no prior domain knowledge, no general corpora, no full segmentation and minimal adaptation for new domains. Consequently, the method can be used in any domain corpus and it is especially useful for resource-limited domains. Evaluations conducted on two different domains for Chinese term extraction show quite significant improvements over existing techniques and also verify the efficiency and relative domain independent nature of the approach. Experiments on new term extraction also indicate that the approach is quite effective for identifying new terms in a domain making it useful for domain knowledge update.

References

Chang Jing-Shin. 2005. Domain Specific Word Extraction from Hierarchical Web Documents: A First Step toward Building Lexicon Trees from Web Corpora. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Learning: 64--71.Google Scholar
Chien LF. 1999. Pat-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management, vol. 35: 501--521.Google ScholarCross Ref
Eibe Frank, Gordon. W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific Keyphrase Extraction. In Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI-99: 668--673. Google ScholarDigital Library
Feng Haodi, Kang Chen, Xiaotie Deng, and Weimin Zheng, 2004. Accessor variety criteria for Chinese word extraction. Computational Linguistics, 30(1):75--93. Google ScholarDigital Library
Hiroshi Nakagawa, and Tatsunori Mori. 2002. A simple but powerful automatic term extraction method. In COMPUTERM-2002 Proceedings of the 2nd International Workshop on Computational Term: 29--35. Taiwan, August 2002. Google ScholarDigital Library
Hisamitsu T., and Y. Niwa. 2002. A measure of term representativeness based on the number of co-occurring salient words. In Proceedings of the 19th COLING, 2002. Google ScholarDigital Library
Huang Chu-Ren, Petr Šimon, Shu-Kai Hsieh, and Laurent Pr'evot. 2007. Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification. In Proceedings of the ACL 2007 Demo and Poster Sessions: 69--72. Joachims T. 2000. Estimating the Generalization Performance of a SVM Efficiently. In Proceedings of the International Conference on Machine Learning, Morgan Kaufman, 2000. Google ScholarDigital Library
Kageura K., and B. Umino. 1996. Methods of automatic term recognition: a review. Term 3(2):259--289.Google ScholarCross Ref
Kleinberg J. 1997. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms: 668--677. New Orleans, America, January 1997. Google ScholarDigital Library
Ji Luning, and Qin Lu. 2007. Chinese Term Extraction Using Window-Based Contextual Information. In Proceedings of CICLing 2007, LNCS 4394: 62--74. Google ScholarDigital Library
Li Hongqiao, Chang-Ning Huang, Jianfeng Gao, and Xiaozhong Fan. The Use of SVM for Chinese New Word Identification. In Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNL P2004): 723--732. Hainan Island, China, March 2004. Google ScholarDigital Library
Luo Shengfen, and Maosong Sun. 2003. Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing: 24--30. Google ScholarDigital Library
McDonald, David D. 1993. Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text, pages 32--43, Columbus, OH, June. Special Interest Group on the Lexicon of the Association for Computational Linguistics.Google Scholar
Nasreen AbdulJaleel and Yan Qu. 2005. Domain Term Extraction and Structuring via Link Analysis. In Proceedings of the AAAI '05 Workshop on Link Analysis: 39--46.Google Scholar
Salton, G., and McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill. Google ScholarDigital Library
Schone, P. and Jurafsky D. 2001. Is Knowledge-free Induction of Multiword Unit Dictionary Headwords a solved problem? In Proceedings of EMNLP2001.Google Scholar
Sornlertlamvanich V., Potipiti T., and Charoenporn T. 2000. Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm. In Proceedings of COLING 2000. Google ScholarDigital Library
Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer, 1995. Google ScholarDigital Library
Zhou GD, Shen D, Zhang J, Su J, and Tan SH. 2005. Recognition of Protein/Gene Names from Text using an Ensemble of Classifiers. BMC Bioinformatics 2005, 6(Suppl 1): S7.Google Scholar

Index Terms

Chinese term extraction using minimal resources
1. Computing methodologies

Recommendations

Discovering Chinese Compound Term Using Termhood and Unithood Measures
CHINAGRID '11: Proceedings of the 2011 Sixth Annual ChinaGrid Conference

Domain terms play a crucial role in many research areas, which has led to a rise in demand for automatic domain terms extraction. In this paper, we present a two-level evaluation approach based on term hood and unit hood to extract Chinese domain ...
Read More
A delimiter-based general approach for Chinese term extraction

This article addresses a two-step approach for term extraction. In the first step on term candidate extraction, a new delimiter-based approach is proposed to identify features of the delimiters of term candidates rather than those of the term candidates ...
Read More
Research on Automatic Chinese Multi-word Term Extraction Based on Term Component
ICCPOL '09: Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy

This paper presents an automatic Chinese multi-word term extraction method based on the unithood and the termhood measure. The unithood of the candidate term is measured by the strength of inner unity and marginal variety. Term component is taken into ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
August 2008
1178 pages
ISBN:9781905593446
Program Chairs:
Donia Scott
Open University
,
Hans Uszkoreit
Universitat des Saarlandes/DFKI
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 18 August 2008
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Chinese term extraction using minimal resources

COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

Discovering Chinese Compound Term Using Termhood and Unithood Measures

A delimiter-based general approach for Chinese term extraction

Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Chinese term extraction using minimal resources

COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

Discovering Chinese Compound Term Using Termhood and Unithood Measures

A delimiter-based general approach for Chinese term extraction

Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media