ABSTRACT
This paper shows that a simple two-stage approach to handle non-local dependencies in Named Entity Recognition (NER) can outperform existing approaches that handle non-local dependencies, while being much more computationally efficient. NER systems typically use sequence models for tractable inference, but this makes them unable to capture the long distance structure present in text. We use a Conditional Random Field (CRF) based NER system using local features to make predictions and then train another CRF which uses both local information and features extracted from the output of the first CRF. Using features capturing non-local dependencies from the same document, our approach yields a 12.6% relative error reduction on the F1 score, over state-of-the-art NER systems using local-information alone, when compared to the 9.3% relative error reduction offered by the best systems that exploit non-local information. Our approach also makes it easy to incorporate non-local information from other documents in the test corpus, and this gives us a 13.3% error reduction over NER systems using local-information alone. Additionally, our running time for inference is just the inference time of two sequential CRFs, which is much less than that directly model the dependencies and do approximate inference.
- A. Borthwick. 1999. A Maximum Entropy Approach to Named Entity Recognition. Ph.D. thesis, New York University. Google ScholarDigital Library
- R. Bunescu and R. J. Mooney, 2004. Collective information extraction with relational Markov networks. In Proceedings of the 42nd ACL, pages 439--446. Google ScholarDigital Library
- H. L. Chieu and H. T. Ng. 2002. Named entity recognition: a maximum entropy approach using global information. In Proceedings of the 19th Coling, pages 190--196. Google ScholarDigital Library
- J. R. Curran and S. Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the 7th CoNLL, pages 164--167. Google ScholarDigital Library
- J. Finkel, T. Grenager, and C. D. Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 42nd ACL. Google ScholarDigital Library
- D. Freitag and A. McCallum. 1999. Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.Google ScholarDigital Library
- S. Geman and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transitions on Pattern Analysis and Machine Intelligence, 6:721--741.Google ScholarDigital Library
- J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th ICML, pages 282--289. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- T. R. Leek. 1997. Information extraction using hidden Markov models. Master's thesis, U.C. San Diego.Google Scholar
- R. Malouf. 2002. Markov models for language-independent named entity recognition. In Proceedings of the 6th CoNLL, pages 187--190. Google ScholarDigital Library
- A. McCallum, D. Freitag, and F. Pereira. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the 17th ICML, pages 591--598. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- A. Mikheev, M. Moens, and C. Grover. 1999. Named entity recognition without gazetteers. In Proceedings of the 9th EACL, pages 1--8. Google ScholarDigital Library
- J. Pearl. 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference. In Morgan Kauffmann. Google ScholarDigital Library
- F. Sha and F. Pereira. 2003. Shallow parsing with conditional random fields. In Proceedings of NAACL-2003, pages 134--141. Google ScholarDigital Library
- C. Sutton and A. McCallum. 2004. Collective segmentation and labeling of distant entities in information extraction. In ICML Workshop on Statistical Relational Learning and Its connections to Other Fields.Google Scholar
- B. Taskar, P. Abbeel, and D. Koller. 2002. Discriminative probabilistic models for relational data. In Proceedings of UAI-02. Google ScholarDigital Library
- J. S. Yedidia, W. T. Freeman, and Y. Weiss. 2000. Generalized belief propagation. In Proceedings of NIPS-2000, pages 689--695.Google Scholar
- Alexander Yeh. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of COLING 2000. Google ScholarDigital Library
- An effective two-stage model for exploiting non-local dependencies in named entity recognition
Recommendations
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student SessionNamed entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic ResourcesNamed entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Comments