research-article

Free Access

Resolving surface forms to Wikipedia topics

Authors:
Yiping Zhou

Yahoo! Labs at Sunnyvale

Yahoo! Labs at Sunnyvale
View Profile

,
Lan Nie

Yahoo! Labs at Sunnyvale

Yahoo! Labs at Sunnyvale
View Profile

,
Omid Rouhani-Kalleh

Yahoo! Labs at Sunnyvale

Yahoo! Labs at Sunnyvale
View Profile

,
Flavian Vasile

Yahoo! Labs at Sunnyvale

Yahoo! Labs at Sunnyvale
View Profile

,
Scott Gaffney

Yahoo! Labs at Sunnyvale

Yahoo! Labs at Sunnyvale
View Profile

Authors Info & Claims

COLING '10: Proceedings of the 23rd International Conference on Computational LinguisticsAugust 2010Pages 1335–1343

Published:23 August 2010Publication History

COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

Pages 1335–1343

ABSTRACT

Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with automatically generated training data. Based on a manually labeled evaluation set containing over 1000 news articles, our resolution model has 85% precision and 87.8% recall. The performance is significantly better than three baselines based on traditional context similarities or sense commonness measurements. Our method can be applied to other languages and scales well to new entities and concepts.

References

Bagga, Amit and Breck Baldwin. 1998. Entity-based cross-document coreferencing using the Vector Space Model. Proceedings of the 17th international conference on Computational linguistics. Google ScholarDigital Library
Bunescu, Razvan and Marius Pasca. 2006. Using Encyclopedic Knowledge for Named Entity Disambiguation. Proceedings of the 11th Conference of the European Chapter of the Association of Computational Linguistics (EACL-2006).Google Scholar
Cucerzan, Silviu. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Google Scholar
Fleischman, Ben Michael and Eduard Hovy. 2004. Multi-Document Person Name Resolution. Proceesing of the Association for Computational Linguistics.Google Scholar
Friedman, J. H. 2001. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38:367--378. Google ScholarDigital Library
Han, Xianpei and Jun Zhao 2009. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.Google ScholarCross Ref
Mann, S. Gidon and David Yarowsky. 2003. Unsupervised Personal Name Disambiguation. Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003. Google ScholarDigital Library
Milne, David and Ian H. Witten. 2008a. Learning to Link with Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2008). Google ScholarDigital Library
Milne, David and Ian H. Witten. 2008b. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence.Google Scholar
Pedersen, Ted, Amruta Purandare and Anagha Kulkarni. 2005. Name Discrimination by Clustering Similar Contexts. Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (2005). Google ScholarDigital Library
Ravin, Y. and Z. Kazi. 1999. Is Hillary Rodham Clinton the President? In Association for Computational Linguistics Workshop on Coreference and its Applications. Google ScholarDigital Library
Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 189--196. Google ScholarDigital Library
Zheng, Zhaohui, K. Chen, G. Sun, and H. Zha. 2007. A regression framework for learning ranking functions using relative relevance judgments. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 287--294. Google ScholarDigital Library

Resolving surface forms to Wikipedia topics
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Surface Name Errors in Wikipedia
CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Surface name is the string used to refer to an entity in a text corpus. Crowd-sourced knowledge repositories such as Wikipedia can have multiple types of errors, including surface name errors. This paper focuses on identifying and correcting surface ...
Read More
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources

Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Read More
Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics
August 2010
1408 pages
General Chair:
Aravind K. Joshi
University of Pennsylvania
,
Program Chairs:
Chu-Ren Huang
The Hong Kong Polytechnic University
,
Dan Jurafsky
Stanford University
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 August 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,537of1,537submissions,100%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 322
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Resolving surface forms to Wikipedia topics

COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Surface Name Errors in Wikipedia

Named entity recognition in Wikipedia

Learning multilingual named entity recognition from Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Resolving surface forms to Wikipedia topics

COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Surface Name Errors in Wikipedia

Named entity recognition in Wikipedia

Learning multilingual named entity recognition from Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media