skip to main content
10.3115/1119176.1119181dlproceedingsArticle/Chapter ViewAbstractPublication PagesconllConference Proceedingsconference-collections
Article
Free Access

Unsupervised personal name disambiguation

Published:31 May 2003Publication History

ABSTRACT

This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering technique over a rich feature space of biographic facts, which are automatically extracted via a language-independent bootstrapping process. The induced clustering of named entities are then partitioned and linked to their real referents via the automatically extracted biographic data. Performance is evaluated based on both a test set of handlabeled multi-referent personal names and via automatically generated pseudonames.

References

  1. A. Bagga and B. Baldwin. 1998. Entity-based cross-document coreferencing using the vector space model. In Christian Boitet and Pete Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pages 79--85, San Francisco, California. Morgan Kaufmann Publishers.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Brin. 1998. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT'98.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. E. Califf and R. J. Mooney. 1998. Relational learning of pattern-match rules for information extraction. In Working Notes, of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 6--11, Menlo Park, CA. AAAI Press.]]Google ScholarGoogle Scholar
  4. D. Freitag and A. McCallum. 1999. Information extraction with hmms and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.]]Google ScholarGoogle Scholar
  5. B. Gale, K. Church, and D. Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In AAAI Fall Symposium on Probabilistic Approaches to Natural Language Processing, pages 54--60, Cambridge, MA.]]Google ScholarGoogle Scholar
  6. S. B. Huffman. 1995. Learning information extraction patterns from examples. In Learning for Natural Language Processing, pages 246--260.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Ravichandran and E. Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Schiffman, I. Mani, and K. J. Concepcion. 2001. Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. A. Smith and G. Crane. 2002. Disambiguating geographic names in a historic digital library. In Proceedings of ECDL, pages 127--136.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Wacholder, Y. Ravin, and M. Choi. 1997. Disambiguation of proper names in text. In Proceedings of Fifth Conference on Applied Natural Language Processing, pages 202--208.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Yangarber, R. Grishman, P. Tapanainen, and S. Huttunen. 2000. Unsupervised discovery of scenario-level patterns for information extraction. In Proceedings of the Sixth Conference on Applied Natural Language Processing, (ANLP-NAACL 2000), pages 282--289.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
    May 2003
    213 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 31 May 2003

    Qualifiers

    • Article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader