Article

Free Access

Unsupervised personal name disambiguation

Authors:
Gideon S. Mann

Johns Hopkins University, Baltimore, MD

Johns Hopkins University, Baltimore, MD
View Profile

,
David Yarowsky

Johns Hopkins University, Baltimore, MD

Johns Hopkins University, Baltimore, MD
View Profile

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4May 2003Pages 33–40https://doi.org/10.3115/1119176.1119181

Published:31 May 2003Publication History

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Pages 33–40

ABSTRACT

This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering technique over a rich feature space of biographic facts, which are automatically extracted via a language-independent bootstrapping process. The induced clustering of named entities are then partitioned and linked to their real referents via the automatically extracted biographic data. Performance is evaluated based on both a test set of handlabeled multi-referent personal names and via automatically generated pseudonames.

References

A. Bagga and B. Baldwin. 1998. Entity-based cross-document coreferencing using the vector space model. In Christian Boitet and Pete Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pages 79--85, San Francisco, California. Morgan Kaufmann Publishers.]] Google ScholarDigital Library
S. Brin. 1998. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT'98.]]Google ScholarDigital Library
M. E. Califf and R. J. Mooney. 1998. Relational learning of pattern-match rules for information extraction. In Working Notes, of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 6--11, Menlo Park, CA. AAAI Press.]]Google Scholar
D. Freitag and A. McCallum. 1999. Information extraction with hmms and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.]]Google Scholar
B. Gale, K. Church, and D. Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In AAAI Fall Symposium on Probabilistic Approaches to Natural Language Processing, pages 54--60, Cambridge, MA.]]Google Scholar
S. B. Huffman. 1995. Learning information extraction patterns from examples. In Learning for Natural Language Processing, pages 246--260.]] Google ScholarDigital Library
D. Ravichandran and E. Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarDigital Library
B. Schiffman, I. Mani, and K. J. Concepcion. 2001. Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarDigital Library
D. A. Smith and G. Crane. 2002. Disambiguating geographic names in a historic digital library. In Proceedings of ECDL, pages 127--136.]] Google ScholarDigital Library
N. Wacholder, Y. Ravin, and M. Choi. 1997. Disambiguation of proper names in text. In Proceedings of Fifth Conference on Applied Natural Language Processing, pages 202--208.]] Google ScholarDigital Library
R. Yangarber, R. Grishman, P. Tapanainen, and S. Huttunen. 2000. Unsupervised discovery of scenario-level patterns for information extraction. In Proceedings of the Sixth Conference on Applied Natural Language Processing, (ANLP-NAACL 2000), pages 282--289.]] Google ScholarDigital Library

Recommendations

Online Person Name Disambiguation with Constraints
JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

While many clustering techniques have been successfully applied to the person name disambiguation problem, most do not address two main practical issues: allowing constraints to be added to the clustering process, and allowing the data to be added ...
Read More
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data management

Ambiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
Read More
Name Disambiguation Using Semantic Association Clustering
ICEBE '09: Proceedings of the 2009 IEEE International Conference on e-Business Engineering

Due to homonyms, abbreviations, etc., name ambiguity is widely available in web and e-document. For example, when integrating heterogeneous literature databases, because there are different name specifications, different authors may be thought of as the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
May 2003
213 pages
Conference Chairs:
Walter Daelemans
University of Antwerp and Tilburg University
,
Miles Osborne
University of Edinburgh
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 31 May 2003
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 99
  Total Citations
  View Citations
- 1,521
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unsupervised personal name disambiguation

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

ABSTRACT

References

Cited By

Recommendations

Online Person Name Disambiguation with Constraints

Web personal name disambiguation based on reference entity tables mined from the web

Name Disambiguation Using Semantic Association Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Unsupervised personal name disambiguation

CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

ABSTRACT

References

Cited By

Recommendations

Online Person Name Disambiguation with Constraints

Web personal name disambiguation based on reference entity tables mined from the web

Name Disambiguation Using Semantic Association Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media