skip to main content
A maximum entropy approach to named entity recognition
Publisher:
  • New York University
  • 202 Tisch Hall Washington Square New York, NY
  • United States
ISBN:978-0-599-47232-7
Order Number:AAI9945252
Pages:
188
Bibliometrics
Skip Abstract Section
Abstract

This thesis describes a novel statistical named-entity (i.e. “proper name”) recognition system known as “MENE” (Maximum Entropy Named Entity). Named entity (N.E.) recognition is a form of information extraction in which we seek to classify every word in a document as being a person-name, organization, location, date, time, monetary value, percentage, or “none of the above”. The task has particular significance for Internet search engines, machine translation, the automatic indexing of documents, and as a foundation for work on more complex information extraction tasks.

Two of the most significant problems facing the constructor of a named entity system are the questions of portability and system performance. A practical N.E. system will need to be ported frequently to new bodies of text and even to new languages. The challenge is to build a system which can be ported with minimal expense (in particular minimal programming by a computational linguist) while maintaining a high degree of accuracy in the new domains or languages.

MENE attempts to address these issues through the use of maximum entropy probabilistic modeling. It utilizes a very flexible object-based architecture which allows it to make use of a broad range of knowledge sources in making its tagging decisions. In the DARPA-sponsored MUC-7 named entity evaluation, the system displayed an accuracy rate which was well-above the median, demonstrating that it can achieve the performance goal. In addition, we demonstrate that the system can be used as a post-processing tool to enhance the output of a hand-coded named entity recognizer through experiments in which MENE improved on the performance of N.E. systems from three different sites. Furthermore, when all three external recognizers are combined under MENE, we are able to achieve very strong results which, in some cases, appear to be competitive with human performance.

Finally, we demonstrate the trans-lingual portability of the system. We ported the system to two Japanese-language named entity tasks, one of which involved a new named entity category, “artifact”. Our results on these tasks were competitive with the best systems built by native Japanese speakers despite the fact that the author speaks no Japanese.

Cited By

  1. Guo Z, Deng J, Zou Y and Tang Y (2024). A hybrid method of combination probability and machine learning for Chinese geological text segmentation, Computers & Geosciences, 183:C, Online publication date: 1-Jan-2024.
  2. ACM
    Liu C, Yuan Z, Zhao J, Rong J and Zhang Y Research on Named Entity Recognition in the Steel Industry Based on MacBERT Proceedings of the 2023 2nd International Symposium on Computing and Artificial Intelligence, (87-90)
  3. Hung J and Chang J (2021). Multi-level transfer learning for improving the performance of deep neural networks, Applied Soft Computing, 109:C, Online publication date: 1-Sep-2021.
  4. ACM
    Nasar Z, Jaffry S and Malik M (2021). Named Entity Recognition and Relation Extraction, ACM Computing Surveys, 54:1, (1-39), Online publication date: 1-Apr-2021.
  5. ACM
    Liu J, Ye L, Zhang H and Guo X Named entity recognition of legal judgment based on small-scale labeled data Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies, (549-555)
  6. ACM
    Ahmad M, Malik M, Shahzad K, Aslam F, Iqbal A, Nawaz Z and Bukhari F (2020). Named Entity Recognition and Classification for Punjabi Shahmukhi, ACM Transactions on Asian and Low-Resource Language Information Processing, 19:4, (1-13), Online publication date: 31-Jul-2020.
  7. ACM
    Cao X and Yang Y Research on Chinese Named Entity Recognition in the Marine Field Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, (1-7)
  8. ACM
    Liu W, Yu B, Zhang C, Wang H and Pan K Chinese Named Entity Recognition Based on Rules and Conditional Random Field Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence, (268-272)
  9. ACM
    Malik M (2017). Urdu Named Entity Recognition and Classification System Using Artificial Neural Network, ACM Transactions on Asian and Low-Resource Language Information Processing, 17:1, (1-13), Online publication date: 31-Mar-2018.
  10. Konkol M and Konopík M Segment Representations in Named Entity Recognition Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302, (61-70)
  11. Martínez V, Pérez L, Iacobelli F, Bojórquez S and González V Semi-Supervised Approach to Named Entity Recognition in Spanish Applied to a Real-World Conversational System Proceedings of the 7th Mexican Conference on Pattern Recognition - Volume 9116, (224-235)
  12. Konkol M, Brychcín T and Konopík M (2015). Latent semantics in Named Entity Recognition, Expert Systems with Applications: An International Journal, 42:7, (3470-3479), Online publication date: 1-May-2015.
  13. ACM
    Abinaya N, John N, Ganesh B, Kumar A and Soman K AMRITA_CEN@FIRE-2014 Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation, (103-111)
  14. ACM
    Su W, Wu H, Li Y, Zhao J, Lochovsky F, Cai H and Huang T (2013). Understanding query interfaces by statistical parsing, ACM Transactions on the Web, 7:2, (1-22), Online publication date: 1-May-2013.
  15. ACM
    Ekbal A, Saha S and Singh D Active machine learning technique for named entity recognition Proceedings of the International Conference on Advances in Computing, Communications and Informatics, (180-186)
  16. Nouvel D, Antoine J, Friburger N and Soulet A Coupling knowledge-based and data-driven systems for named entity recognition Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, (69-77)
  17. ACM
    Zhang X, Mitra P, Klippel A and MacEachren A Identifying destinations automatically from human generated route directions Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, (373-376)
  18. ACM
    Ekbal A and Saha S (2011). Weighted Vote-Based Classifier Ensemble for Named Entity Recognition, ACM Transactions on Asian Language Information Processing, 10:2, (1-37), Online publication date: 1-Jun-2011.
  19. Sam R, Le H, Nguyen T and Nguyen T Combining proper name-coreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I, (512-524)
  20. ACM
    Sun B, Mitra P, Lee Giles C and Mueller K (2011). Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents, ACM Transactions on Information Systems, 29:2, (1-38), Online publication date: 1-Apr-2011.
  21. ACM
    Okamoto M, Watanabe N, Kikuchi M, Iida T, Sasaki K, Horiuchi K, Yamasaki T, Omura S and Hattori M First query term extraction from current webpage for mobile applications Proceedings of the 9th International Conference on Mobile and Ubiquitous Multimedia, (1-9)
  22. ACM
    Mukund S, Srihari R and Peterson E (2010). An Information-Extraction System for Urdu---A Resource-Poor Language, ACM Transactions on Asian Language Information Processing, 9:4, (1-43), Online publication date: 1-Dec-2010.
  23. Zhang X, Mitra P, Klippel A and MacEachren A Automatic extraction of destinations, origins and route parts from human generated route directions Proceedings of the 6th international conference on Geographic information science, (279-294)
  24. Liu Z, Zhu C and Zhao T Chinese named entity recognition with a sequence labeling approach Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing, (634-640)
  25. Saha S, Mitra P and Sarkar S A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence, (225-230)
  26. Okamoto M and Kikuchi M Discovering Volatile Events in Your Neighborhood Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology, (181-192)
  27. Ekbal A and Bandyopadhyay S Voted NER system using appropriate unlabeled data Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, (202-210)
  28. Finkel J and Manning C Nested named entity recognition Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, (141-150)
  29. ACM
    Guo J, Xu G, Cheng X and Li H Named entity recognition in query Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, (267-274)
  30. ACM
    Xu G, Yang S and Li H Named entity mining from click-through data using weakly supervised latent dirichlet allocation Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, (1365-1374)
  31. Mukund S and Srihari R NE tagging for Urdu based on bootstrap POS learning Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies, (61-69)
  32. ACM
    Su W, Wang J and Lochovsky F (2009). ODE, ACM Transactions on Database Systems, 34:2, (1-35), Online publication date: 1-Jun-2009.
  33. Guo H, Zhu H, Guo Z, Zhang X, Wu X and Su Z Domain adaptation with latent semantic association for named entity recognition Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, (281-289)
  34. Ekbal A and Bandyopadhyay S Improving the Performance of a NER System by Post-processing and Voting Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, (831-841)
  35. Saquete E, Ferrández O, Ferrández S, Martínez-Barco P and Muñoz R (2008). Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization, Information Sciences: an International Journal, 178:17, (3319-3332), Online publication date: 1-Sep-2008.
  36. Gu B, Popowich F and Dahl V Recognizing biomedical named entities in Chinese research abstracts Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence, (114-125)
  37. Khalid M, Jijkoun V and De Rijke M The impact of named entity normalization on information retrieval for question answering Proceedings of the IR research, 30th European conference on Advances in information retrieval, (705-710)
  38. Wu C, Tsai R and Hsu W Semi-joint labeling for chinese named entity recognition Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, (107-116)
  39. Ekbal A and Bandyopadhyay S A hidden Markov model based named entity recognition system Proceedings of the 2nd international conference on Pattern recognition and machine intelligence, (545-552)
  40. Geleijnse G and Korst J Creating a dead poets society Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, (156-168)
  41. Piskorski J, Sydow M and Kupść A Lemmatization of Polish person names Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, (27-34)
  42. Ramakrishnan G, Joshi S, Balakrishnan S and Srinivasan A Using ILP to construct features for information extraction from semi-structured text Proceedings of the 17th international conference on Inductive logic programming, (211-224)
  43. ACM
    Sun B, Tan Q, Mitra P and Giles C Extraction and search of chemical formulae in text documents on the web Proceedings of the 16th international conference on World Wide Web, (251-260)
  44. Varga D and Simon E (2007). Hungarian named entity recognition with a maximum entropy approach, Acta Cybernetica, 18:2, (293-301), Online publication date: 1-Feb-2007.
  45. Wong Y and Ng H One class per named entity Proceedings of the 20th international joint conference on Artifical intelligence, (1763-1768)
  46. Billsus D and Pazzani M Adaptive news access The adaptive web, (550-570)
  47. Zhang S, Wang X, Wen J, Qin Y and Zhong Y A probabilistic feature based maximum entropy model for chinese named entity recognition Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead, (189-196)
  48. ACM
    Turmo J, Ageno A and Català N (2006). Adaptive information extraction, ACM Computing Surveys, 38:2, (4-es), Online publication date: 25-Jul-2006.
  49. Guo H, Zhang L and Su Z Empirical study on the performance stability of named entity recognition model across domains Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, (509-516)
  50. Kambhatla N Minority vote Proceedings of the COLING/ACL on Main conference poster sessions, (460-466)
  51. Krishnan V and Manning C An effective two-stage model for exploiting non-local dependencies in named entity recognition Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, (1121-1128)
  52. Sudoh K, Tsukada H and Isozaki H Incorporating speech recognition confidence into discriminative named entity recognition of speech data Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, (617-624)
  53. Jiang J and Zhai C Exploiting domain structure for named entity recognition Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, (74-81)
  54. Wu Y, Fan T, Lee Y and Yen S Extracting named entities using support vector machines Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature, (91-103)
  55. Kozareva Z Bootstrapping named entity recognition with automatically generated gazetteer lists Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, (15-21)
  56. Wu C, Tsai T and Hsu W Learning to integrate web taxonomies with fine-grained relations Proceedings of the Second Asia conference on Asia Information Retrieval Technology, (190-205)
  57. Wu Y, Zhao J, Xu B and Yu H Chinese named entity recognition based on multiple features Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, (427-434)
  58. Phan H, Nguyen M, Horiguchi S, Ho B and Inoguchi Y Classification with maximum entropy modeling of predictive association rules Proceedings of the 16th European conference on Machine Learning, (682-689)
  59. Noguera E, Toral A, Llopis F and Muńoz R Reducing question answering input data using named entity recognition Proceedings of the 8th international conference on Text, Speech and Dialogue, (428-434)
  60. Zitouni I, Sorensen J, Luo X and Florian R The impact of morphological stemming on Arabic mention detection and coreference resolution Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, (63-70)
  61. Solorio T Exploiting named entity taggers in a second language Proceedings of the ACL Student Research Workshop, (25-30)
  62. Ji H and Grishman R Improving name tagging by reference resolution and relation detection Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, (411-418)
  63. Finkel J, Grenager T and Manning C Incorporating non-local information into information extraction systems by Gibbs sampling Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, (363-370)
  64. Toral A, Noguera E, Llopis F and Muñoz R Improving question answering using named entity recognition Proceedings of the 10th international conference on Natural Language Processing and Information Systems, (181-191)
  65. ACM
    Fu G and Luke K (2005). Chinese named entity recognition using lexicalized HMMs, ACM SIGKDD Explorations Newsletter, 7:1, (19-25), Online publication date: 1-Jun-2005.
  66. Houfeng W and Wuguang S A simple rule-based approach to organization name recognition in chinese text Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing, (769-772)
  67. Zhou G, Su J and Yang L Resolution of data sparseness in named entity recognition using hierarchical features and feature relaxation principle Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing, (750-761)
  68. Le Nguyen M, Shimazu A, Horiguchi S, Ho B and Fukushi M Probabilistic sentence reduction using support vector machines Proceedings of the 20th international conference on Computational Linguistics, (743-es)
  69. Xiong D, Yu H and Liu Q Tagging complex NEs with maxent models Proceedings of the First international joint conference on Natural Language Processing, (537-544)
  70. Wu Y, Zhao J and Xu B Chinese Named Entity Recognition combining a statistical model with human knowledge Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15, (65-72)
  71. Kazama J and Tsujii J Evaluation and extension of maximum entropy models with inequality constraints Proceedings of the 2003 conference on Empirical methods in natural language processing, (137-144)
  72. Shen D, Zhang J, Zhou G, Su J and Tan C Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13, (49-56)
  73. Zhang T and Johnson D A robust risk minimization based named entity recognition system Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, (204-207)
  74. Klein D, Smarr J, Nguyen H and Manning C Named entity recognition with character-level models Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, (180-183)
  75. Florian R, Ittycheriah A, Jing H and Zhang T Named entity recognition through classifier combination Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, (168-171)
  76. Curran J and Clark S Language independent NER using a maximum entropy tagger Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, (164-167)
  77. Tjong Kim Sang E and De Meulder F Introduction to the CoNLL-2003 shared task Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, (142-147)
  78. Poibeau T The multilingual named entity recognition framework Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2, (155-158)
  79. Malouf R Markov models for language-independent named entity recognition proceedings of the 6th conference on Natural language learning - Volume 20, (1-4)
  80. Florian R Named entity recognition as a house of cards proceedings of the 6th conference on Natural language learning - Volume 20, (1-4)
  81. Burger J, Henderson J and Morgan W Statistical named entity recognizer adaptation proceedings of the 6th conference on Natural language learning - Volume 20, (1-4)
  82. Lin Y and Hung P Probabilistic named entity verification COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14, (1-7)
  83. Ye S, Chua T and Jimin L An agent-based approach to Chinese named entity recognition Proceedings of the 19th international conference on Computational linguistics - Volume 1, (1-7)
  84. Isozaki H and Kazawa H Efficient support vector classifiers for named entity recognition Proceedings of the 19th international conference on Computational linguistics - Volume 1, (1-7)
  85. Chieu H and Ng H Named entity recognition Proceedings of the 19th international conference on Computational linguistics - Volume 1, (1-7)
  86. Sun J, Gao J, Zhang L, Zhou M and Huang C Chinese named entity identification using class-based language model Proceedings of the 19th international conference on Computational linguistics - Volume 1, (1-7)
  87. Chieu H and Ng H A maximum entropy approach to information extraction from semi-structured and free text Eighteenth national conference on Artificial intelligence, (786-791)
  88. Chua T and Liu J Learning pattern rules for Chinese named entity extraction Eighteenth national conference on Artificial intelligence, (411-418)
  89. Chieu H and Ng H Teaching a weaker classifier Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, (481-488)
  90. Zhou G and Su J Named entity recognition using an HMM-based chunk tagger Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, (473-480)
  91. Isozaki H Japanese named entity recognition based on a simple rule generator and decision tree learning Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, (314-321)
  92. Amaya F and Benedí J Improvement of a Whole Sentence Maximum Entropy Language Model using grammatical features Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, (10-17)
  93. Nobata C, Collier N and Tsujii J Comparison between tagged corpora for the named entity task Proceedings of the Workshop on Comparing Corpora, (20-27)
  94. Nobata C, Collier N and Tsujii J Comparison between tagged corpora for the named entity task Proceedings of the workshop on Comparing corpora - Volume 9, (20-27)
Contributors
  • New York University
  • New York University

Recommendations