A maximum entropy approach to named entity recognition

January 1999

Author:
Andrew Eliot Borthwick,
Adviser:
Ralph Grishman

Publisher:

New York University
202 Tisch Hall Washington Square New York, NY
United States

ISBN:978-0-599-47232-7

Order Number:AAI9945252

Pages:

188

Purchase on ProQuest

Bibliometrics

Abstract

This thesis describes a novel statistical named-entity (i.e. “proper name”) recognition system known as “MENE” (Maximum Entropy Named Entity). Named entity (N.E.) recognition is a form of information extraction in which we seek to classify every word in a document as being a person-name, organization, location, date, time, monetary value, percentage, or “none of the above”. The task has particular significance for Internet search engines, machine translation, the automatic indexing of documents, and as a foundation for work on more complex information extraction tasks.

Two of the most significant problems facing the constructor of a named entity system are the questions of portability and system performance. A practical N.E. system will need to be ported frequently to new bodies of text and even to new languages. The challenge is to build a system which can be ported with minimal expense (in particular minimal programming by a computational linguist) while maintaining a high degree of accuracy in the new domains or languages.

MENE attempts to address these issues through the use of maximum entropy probabilistic modeling. It utilizes a very flexible object-based architecture which allows it to make use of a broad range of knowledge sources in making its tagging decisions. In the DARPA-sponsored MUC-7 named entity evaluation, the system displayed an accuracy rate which was well-above the median, demonstrating that it can achieve the performance goal. In addition, we demonstrate that the system can be used as a post-processing tool to enhance the output of a hand-coded named entity recognizer through experiments in which MENE improved on the performance of N.E. systems from three different sites. Furthermore, when all three external recognizers are combined under MENE, we are able to achieve very strong results which, in some cases, appear to be competitive with human performance.

Finally, we demonstrate the trans-lingual portability of the system. We ported the system to two Japanese-language named entity tasks, one of which involved a new named entity category, “artifact”. Our results on these tasks were competitive with the best systems built by native Japanese speakers despite the fact that the author speaks no Japanese.

Cited By

Contributors

Ralph Grishman
New York University
- Publication Years1970 - 2018
- Publication counts120
- Citation count1,515
- Available for Download102
- Downloads (cumulative)31,027
- Downloads (12 months)2,538
- Downloads (6 weeks)485
- Average Downloads per Article304
- Average Citation per Article13
View Full Profile
Andrew Eliot Borthwick
New York University
- Publication Years1999 - 2012
- Publication counts2
- Citation count94
- Available for Download1
- Downloads (cumulative)306
- Downloads (12 months)16
- Downloads (6 weeks)0
- Average Downloads per Article306
- Average Citation per Article47
View Full Profile

Recommendations

Named entity recognition with a maximum entropy approach
CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

The named entity recognition (NER) task involves identifying noun phrases that are names, and assigning a class to each name. This task has its origin from the Message Understanding Conferences (MUC) in the 1990s, a series of conferences aimed at ...
Read More
Maximum entropy models for named entity recognition
CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

In this paper, we describe a system that applies maximum entropy (ME) models to the task of named entity recognition (NER). Starting with an annotated corpus and a set of features which are easily obtainable for almost any language, we first build a ...
Read More
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Read More

Comments

Browse Theses

Sections

Cited By

Named entity recognition with a maximum entropy approach

Maximum entropy models for named entity recognition

Two-stage approach to named entity recognition using Wikipedia and DBpedia

Sections

Cited By

Save to Binder

Recommendations

Named entity recognition with a maximum entropy approach

Maximum entropy models for named entity recognition

Two-stage approach to named entity recognition using Wikipedia and DBpedia