ABSTRACT
Information extraction systems incorporate multiple stages of linguistic analysis. Although errors are typically compounded from stage to stage, it is possible to reduce the errors in one stage by harnessing the results of the other stages. We demonstrate this by using the results of coreference analysis and relation extraction to reduce the errors produced by a Chinese name tagger. We use an N-best approach to generate multiple hypotheses and have them re-ranked by subsequent stages of processing. We obtained thereby a reduction of 24% in spurious and incorrect name tags, and a reduction of 14% in missed tags.
- Daniel M. Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. 1997. Nymble: a high-performance Learning Name-finder. Proc. Fifth Conf. on Applied Natural Language Processing, Washington, D.C. Google ScholarDigital Library
- Andrew Borthwick. 1999. A Maximum Entropy Approach to Named Entity Recognition. Ph.D. Dissertation, Dept. of Computer Science, New York University. Google ScholarDigital Library
- Hai Leong Chieu and Hwee Tou Ng. 2002. Named Entity Recognition: A Maximum Entropy Approach Using Global Information. Proc.: 17th Int'l Conf. on Computational Linguistics (COLING 2002), Taipei, Taiwan. Google ScholarDigital Library
- Yen-Lu Chow and Richard Schwartz. 1989. The N-Best Algorithm: An efficient Procedure for Finding Top N Sentence Hypotheses. Proc. DARPA Speech and Natural Language Workshop Google ScholarDigital Library
- Michael Collins. 2002. Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron. Proc. ACL 2002 Google ScholarDigital Library
- Heng Ji and Ralph Grishman. 2004. Applying Coreference to Improve Name Recognition. Proc. ACL 2004 Workshop on Reference Resolution and Its Applications, Barcelona, SpainGoogle Scholar
- N. Kambhatla. 2004. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations. Proc. ACL 2004. Google ScholarDigital Library
- Tobias Scheffer, Christian Decomain, and Stefan Wrobel. 2001. Active Hidden Markov Models for Information Extraction. Proc. Int'l Symposium on Intelligent Data Analysis (IDA-2001). Google ScholarDigital Library
- Dmitry Zelenko, Chinatsu Aone, and Jason Tibbets. 2004. Binary Integer Programming for Information Extraction. ACE Evaluation Meeting, September 2004, Alexandria, VA.Google Scholar
- Lufeng Zhai, Pascale Fung, Richard Schwartz, Marine Carpuat, and Dekai Wu. 2004. Using N-best Lists for Named Entity Recognition from Chinese Speech. Proc. NAACL 2004 (Short Papers) Google ScholarDigital Library
- Improving name tagging by reference resolution and relation detection
Recommendations
Improving coreference resolution using bridging reference resolution and automatically acquired synonyms
DAARC'07: Proceedings of the 6th discourse anaphora and anaphor resolution conference on Anaphora: analysis, algorithms and applicationsWe present a knowledge-rich approach to Japanese coreference resolution. In Japanese, proper noun coreference and common noun coreference occupy a central position in coreference relations. To improve coreference resolution for such language, wide-...
Automatic Detection of Arabic Non-Anaphoric Pronouns for Improving Anaphora Resolution
Anaphora resolution is one of the most difficult tasks in NLP. The ability to identify non-referential pronouns before attempting an anaphora resolution task would be significant, since the system would not have to attempt resolving such pronouns and ...
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementAmbiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
Comments