skip to main content
10.5555/2390524.2390598dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

Joint inference of named entity recognition and normalization for tweets

Authors Info & Claims
Published:08 July 2012Publication History

ABSTRACT

Tweets represent a critical source of fresh information, in which named entities occur frequently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the errors propagated from named entity recognition (NER) and the dearth of information in a single tweet. We propose a novel graphical model to simultaneously conduct NER and NEN on multiple tweets to address these challenges. Particularly, our model introduces a binary random variable for each pair of words with the same lemma across similar tweets, whose value indicates whether the two related words are mentions of the same entity. We evaluate our method on a manually annotated data set, and show that our method outperforms the baseline that handles these two tasks separately, boosting the F1 from 80.2% to 83.6% for NER, and the Accuracy from 79.4% to 82.6% for NEN, respectively.

References

  1. Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Frederick Reiss, and Shivakumar Vaithyanathan. 2010. Domain adaptation of rule-based annotators for named-entity recognition tasks. In EMNLP, pages 1002--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aaron Cohen. 2005. Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pages 17--24, Detroit, June. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. In In Proc. 2007 Joint Conference on EMNLP and CNLL, pages 708--716.Google ScholarGoogle Scholar
  4. Hong-Jie Dai, Richard Tzong-Han Tsai, and Wen-Lian Hsu. 2011. Entity disambiguation using a markov-logic network. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 846--855, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing.Google ScholarGoogle Scholar
  5. Doug Downey, Matthew Broadhead, and Oren Etzioni. 2007. Locating Complex Named Entities in Web Text. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell., 165(1): 91--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tim Finin, Will Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, and Mark Dredze. 2010. Annotating named entities in twitter data with crowd-sourcing. In CSLDAMT, pages 80--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jenny Rose Finkel and Christopher D. Manning. 2009. Nested named entity recognition. In EMNLP, pages 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michel Galley. 2006. A skip-chain conditional random field for ranking meeting utterances by importance. In Association for Computational Linguistics, pages 364--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bo Han and Timothy Baldwin. 2011. Lexical normalisation of short text messages: Makn sens a #twitter. In ACL HLT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Martin Jansche and Steven P. Abney. 2002. Information extraction from voicemail transcripts. In EMNLP, pages 320--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Valentin Jijkoun, Mahboob Alam Khalid, Maarten Marx, and Maarten de Rijke. 2008. Named entity normalization in user generated content. In Proceedings of the second workshop on Analytics for noisy unstructured text data, AND '08, pages 23--30, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mahboob Khalid, Valentin Jijkoun, and Maarten de Rijke. 2008. The impact of named entity normalization on information retrieval for question answering. In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen White, editors, Advances in Information Retrieval, volume 4956 of Lecture Notes in Computer Science, pages 705--710. Springer Berlin/Heidelberg. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. George R. Krupka and Kevin Hausman. 1998. Isoquest: Description of the netowl#8482; extractor system as used in muc-7. In MUC-7.Google ScholarGoogle Scholar
  15. Huifeng Li, Rohini K. Srihari, Cheng Niu, and Wei Li. 2002. Location normalization for information extraction. In COLING. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiaohua Liu, Shaodian Zhang, Furu Wei, and Ming Zhou. 2011. Recognizing named entities in tweets. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Walid Magdy, Kareem Darwish, Ossama Emam, and Hany Hassan. 2007. Arabic cross-document person name normalization. In In CASL Workshop 07, pages 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Andrew Mccallum and Wei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In HLT-NAACL, pages 188--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Einat Minkov, Richard C. Wang, and William W. Cohen. 2005. Extracting personal names from email: applying named entity recognition to informal text. In HLT, pages 443--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In In Proceedings of Uncertainty in AI, pages 467--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes, 30: 3--26.Google ScholarGoogle ScholarCross RefCross Ref
  22. Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In CoNLL, pages 147--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1524--1534, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sameer Singh, Dustin Hillard, and Chris Leggetter. 2010. Minimally-supervised extraction of entities from text advertisements. In HLT-NAACL, pages 73--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In HLT-NAACL, pages 142--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yefeng Wang. 2009. Annotating and recognising named entities in clinical notes. In ACL-IJCNLP, pages 18--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kazuhiro Yoshida and Jun'ichi Tsujii. 2007. Reranking for biomedical named-entity recognition. In BioNLP, pages 209--216. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
    July 2012
    1100 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 8 July 2012

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate85of443submissions,19%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader