skip to main content
10.5555/1873781.1873931dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
research-article
Free Access

Resolving surface forms to Wikipedia topics

Published:23 August 2010Publication History

ABSTRACT

Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with automatically generated training data. Based on a manually labeled evaluation set containing over 1000 news articles, our resolution model has 85% precision and 87.8% recall. The performance is significantly better than three baselines based on traditional context similarities or sense commonness measurements. Our method can be applied to other languages and scales well to new entities and concepts.

References

  1. Bagga, Amit and Breck Baldwin. 1998. Entity-based cross-document coreferencing using the Vector Space Model. Proceedings of the 17th international conference on Computational linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bunescu, Razvan and Marius Pasca. 2006. Using Encyclopedic Knowledge for Named Entity Disambiguation. Proceedings of the 11th Conference of the European Chapter of the Association of Computational Linguistics (EACL-2006).Google ScholarGoogle Scholar
  3. Cucerzan, Silviu. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Google ScholarGoogle Scholar
  4. Fleischman, Ben Michael and Eduard Hovy. 2004. Multi-Document Person Name Resolution. Proceesing of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  5. Friedman, J. H. 2001. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38:367--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Han, Xianpei and Jun Zhao 2009. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mann, S. Gidon and David Yarowsky. 2003. Unsupervised Personal Name Disambiguation. Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Milne, David and Ian H. Witten. 2008a. Learning to Link with Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Milne, David and Ian H. Witten. 2008b. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence.Google ScholarGoogle Scholar
  10. Pedersen, Ted, Amruta Purandare and Anagha Kulkarni. 2005. Name Discrimination by Clustering Similar Contexts. Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ravin, Y. and Z. Kazi. 1999. Is Hillary Rodham Clinton the President? In Association for Computational Linguistics Workshop on Coreference and its Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 189--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zheng, Zhaohui, K. Chen, G. Sun, and H. Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 287--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Resolving surface forms to Wikipedia topics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image DL Hosted proceedings
      COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics
      August 2010
      1408 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      • Published: 23 August 2010

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,537of1,537submissions,100%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader