skip to main content
10.5555/1599081.1599211dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
research-article
Free Access

Chinese term extraction using minimal resources

Authors Info & Claims
Published:18 August 2008Publication History

ABSTRACT

This paper presents a new approach for term extraction using minimal resources. A term candidate extraction algorithm is proposed to identify features of the relatively stable and domain independent term delimiters rather than that of the terms. For term verification, a link analysis based method is proposed to calculate the relevance between term candidates and the sentences in the domain specific corpus from which the candidates are extracted. The proposed approach requires no prior domain knowledge, no general corpora, no full segmentation and minimal adaptation for new domains. Consequently, the method can be used in any domain corpus and it is especially useful for resource-limited domains. Evaluations conducted on two different domains for Chinese term extraction show quite significant improvements over existing techniques and also verify the efficiency and relative domain independent nature of the approach. Experiments on new term extraction also indicate that the approach is quite effective for identifying new terms in a domain making it useful for domain knowledge update.

References

  1. Chang Jing-Shin. 2005. Domain Specific Word Extraction from Hierarchical Web Documents: A First Step toward Building Lexicon Trees from Web Corpora. In Proceedings of the Fourth SIGHAN Workshop on Chinese Language Learning: 64--71.Google ScholarGoogle Scholar
  2. Chien LF. 1999. Pat-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management, vol. 35: 501--521.Google ScholarGoogle ScholarCross RefCross Ref
  3. Eibe Frank, Gordon. W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific Keyphrase Extraction. In Proceedings of 16th International Joint Conference on Artificial Intelligence IJCAI-99: 668--673. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Feng Haodi, Kang Chen, Xiaotie Deng, and Weimin Zheng, 2004. Accessor variety criteria for Chinese word extraction. Computational Linguistics, 30(1):75--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hiroshi Nakagawa, and Tatsunori Mori. 2002. A simple but powerful automatic term extraction method. In COMPUTERM-2002 Proceedings of the 2nd International Workshop on Computational Term: 29--35. Taiwan, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hisamitsu T., and Y. Niwa. 2002. A measure of term representativeness based on the number of co-occurring salient words. In Proceedings of the 19th COLING, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Huang Chu-Ren, Petr Šimon, Shu-Kai Hsieh, and Laurent Pr'evot. 2007. Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification. In Proceedings of the ACL 2007 Demo and Poster Sessions: 69--72. Joachims T. 2000. Estimating the Generalization Performance of a SVM Efficiently. In Proceedings of the International Conference on Machine Learning, Morgan Kaufman, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kageura K., and B. Umino. 1996. Methods of automatic term recognition: a review. Term 3(2):259--289.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kleinberg J. 1997. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms: 668--677. New Orleans, America, January 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ji Luning, and Qin Lu. 2007. Chinese Term Extraction Using Window-Based Contextual Information. In Proceedings of CICLing 2007, LNCS 4394: 62--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Li Hongqiao, Chang-Ning Huang, Jianfeng Gao, and Xiaozhong Fan. The Use of SVM for Chinese New Word Identification. In Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNL P2004): 723--732. Hainan Island, China, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Luo Shengfen, and Maosong Sun. 2003. Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing: 24--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. McDonald, David D. 1993. Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text, pages 32--43, Columbus, OH, June. Special Interest Group on the Lexicon of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  14. Nasreen AbdulJaleel and Yan Qu. 2005. Domain Term Extraction and Structuring via Link Analysis. In Proceedings of the AAAI '05 Workshop on Link Analysis: 39--46.Google ScholarGoogle Scholar
  15. Salton, G., and McGill, M. J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Schone, P. and Jurafsky D. 2001. Is Knowledge-free Induction of Multiword Unit Dictionary Headwords a solved problem? In Proceedings of EMNLP2001.Google ScholarGoogle Scholar
  17. Sornlertlamvanich V., Potipiti T., and Charoenporn T. 2000. Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm. In Proceedings of COLING 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhou GD, Shen D, Zhang J, Su J, and Tan SH. 2005. Recognition of Protein/Gene Names from Text using an Ensemble of Classifiers. BMC Bioinformatics 2005, 6(Suppl 1): S7.Google ScholarGoogle Scholar

Index Terms

  1. Chinese term extraction using minimal resources

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image DL Hosted proceedings
              COLING '08: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
              August 2008
              1178 pages
              ISBN:9781905593446

              Publisher

              Association for Computational Linguistics

              United States

              Publication History

              • Published: 18 August 2008

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,537of1,537submissions,100%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader