ABSTRACT
This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering technique over a rich feature space of biographic facts, which are automatically extracted via a language-independent bootstrapping process. The induced clustering of named entities are then partitioned and linked to their real referents via the automatically extracted biographic data. Performance is evaluated based on both a test set of handlabeled multi-referent personal names and via automatically generated pseudonames.
- A. Bagga and B. Baldwin. 1998. Entity-based cross-document coreferencing using the vector space model. In Christian Boitet and Pete Whitelock, editors, Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pages 79--85, San Francisco, California. Morgan Kaufmann Publishers.]] Google ScholarDigital Library
- S. Brin. 1998. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT'98.]]Google ScholarDigital Library
- M. E. Califf and R. J. Mooney. 1998. Relational learning of pattern-match rules for information extraction. In Working Notes, of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 6--11, Menlo Park, CA. AAAI Press.]]Google Scholar
- D. Freitag and A. McCallum. 1999. Information extraction with hmms and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.]]Google Scholar
- B. Gale, K. Church, and D. Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In AAAI Fall Symposium on Probabilistic Approaches to Natural Language Processing, pages 54--60, Cambridge, MA.]]Google Scholar
- S. B. Huffman. 1995. Learning information extraction patterns from examples. In Learning for Natural Language Processing, pages 246--260.]] Google ScholarDigital Library
- D. Ravichandran and E. Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarDigital Library
- B. Schiffman, I. Mani, and K. J. Concepcion. 2001. Producing biographical summaries: Combining linguistic knowledge with corpus statistics. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarDigital Library
- D. A. Smith and G. Crane. 2002. Disambiguating geographic names in a historic digital library. In Proceedings of ECDL, pages 127--136.]] Google ScholarDigital Library
- N. Wacholder, Y. Ravin, and M. Choi. 1997. Disambiguation of proper names in text. In Proceedings of Fifth Conference on Applied Natural Language Processing, pages 202--208.]] Google ScholarDigital Library
- R. Yangarber, R. Grishman, P. Tapanainen, and S. Huttunen. 2000. Unsupervised discovery of scenario-level patterns for information extraction. In Proceedings of the Sixth Conference on Applied Natural Language Processing, (ANLP-NAACL 2000), pages 282--289.]] Google ScholarDigital Library
Recommendations
Online Person Name Disambiguation with Constraints
JCDL '15: Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital LibrariesWhile many clustering techniques have been successfully applied to the person name disambiguation problem, most do not address two main practical issues: allowing constraints to be added to the clustering process, and allowing the data to be added ...
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementAmbiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
Name Disambiguation Using Semantic Association Clustering
ICEBE '09: Proceedings of the 2009 IEEE International Conference on e-Business EngineeringDue to homonyms, abbreviations, etc., name ambiguity is widely available in web and e-document. For example, when integrating heterogeneous literature databases, because there are different name specifications, different authors may be thought of as the ...
Comments