ABSTRACT
Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly, our model introduces a binary random variable for each pair of words with the same lemma across similar tweets, whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively.
- Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, and Shivakumar Vaithyanathan. 2010. Domain adaptation of rule-based annotators for named-entity recognition tasks. In EMNLP, pages 1002--1012. Google ScholarDigital Library
- Aaron Cohen. 2005. Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pages 17--24, Detroit, June. Association for Computational Linguistics. Google ScholarDigital Library
- Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. In In Proc. 2007 Joint Conference on EMNLP and CNLL, pages 708--716.Google Scholar
- Hong-Jie Dai, Richard Tzong-Han Tsai, and Wen-Lian Hsu. 2011. Entity disambiguation using a markov-logic network. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 846--855, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing.Google Scholar
- Doug Downey, Matthew Broadhead, and Oren Etzioni. 2007. Locating Complex Named Entities in Web Text. In IJCAI. Google ScholarDigital Library
- Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell., 165(1): 91--134. Google ScholarDigital Library
- Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating named entities in twitter data with crowd-sourcing. In CSLDAMT, pages 80--88. Google ScholarDigital Library
- Jenny Rose Finkel and Christopher D. Manning. 2009. Nested named entity recognition. In EMNLP, pages 141--150. Google ScholarDigital Library
- Michel Galley. 2006. A skip-chain conditional random field for ranking meeting utterances by importance. In Association for Computational Linguistics, pages 364--372. Google ScholarDigital Library
- Bo Han and Timothy Baldwin. 2011. Lexical normalisation of short text messages: Makn sens a #twitter. In ACL HLT. Google ScholarDigital Library
- Martin Jansche and Steven P. Abney. 2002. Information extraction from voicemail transcripts. In EMNLP, pages 320--327. Google ScholarDigital Library
- Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, and Maarten de Rijke. 2008. Named entity normalization in user generated content. In Proceedings of the second workshop on Analytics for noisy unstructured text data, AND '08, pages 23--30, New York, NY, USA. ACM. Google ScholarDigital Library
- Mahboob Khalid, Valentin Jijkoun, and Maarten de Rijke. 2008. The impact of named entity normalization on information retrieval for question answering. In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen White, editors, Advances in Information Retrieval, volume 4956 of Lecture Notes in Computer Science, pages 705--710. Springer Berlin/Heidelberg. Google ScholarDigital Library
- George R. Krupka and Kevin Hausman. 1998. Isoquest: Description of the netowl#8482; extractor system as used in muc-7. In MUC-7.Google Scholar
- Huifeng Li, Rohini K. Srihari, Cheng Niu, and Wei Li. 2002. Location normalization for information extraction. In COLING. Google ScholarDigital Library
- Xiaohua Liu, Shaodian Zhang, Furu Wei, and Ming Zhou. 2011. Recognizing named entities in tweets. In ACL. Google ScholarDigital Library
- Walid Magdy, Kareem Darwish, Ossama Emam, and Hany Hassan. 2007. Arabic cross-document person name normalization. In In CASL Workshop 07, pages 25--32. Google ScholarDigital Library
- Andrew Mccallum and Wei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In HLT-NAACL, pages 188--191. Google ScholarDigital Library
- Einat Minkov, Richard C. Wang, and William W. Cohen. 2005. Extracting personal names from email: applying named entity recognition to informal text. In HLT, pages 443--450. Google ScholarDigital Library
- Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In In Proceedings of Uncertainty in AI, pages 467--475. Google ScholarDigital Library
- David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes, 30: 3--26.Google ScholarCross Ref
- Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In CoNLL, pages 147--155. Google ScholarDigital Library
- Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1524--1534, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Google ScholarDigital Library
- Sameer Singh, Dustin Hillard, and Chris Leggetter. 2010. Minimally-supervised extraction of entities from text advertisements. In HLT-NAACL, pages 73--81. Google ScholarDigital Library
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In HLT-NAACL, pages 142--147. Google ScholarDigital Library
- Yefeng Wang. 2009. Annotating and recognising named entities in clinical notes. In ACL-IJCNLP, pages 18--26. Google ScholarDigital Library
- Kazuhiro Yoshida and Jun'ichi Tsujii. 2007. Reranking for biomedical named-entity recognition. In BioNLP, pages 209--216. Google ScholarDigital Library
Recommendations
A joint named entity recognition and entity linking system
HYBRID '12: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual DataWe present a joint system for named entity recognition (NER) and entity linking (EL), allowing for named entities mentions extracted from textual data to be matched to uniquely identifiable entities. Our approach relies on combined NER modules which ...
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student SessionNamed entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Comments