research-article

Weakly-supervised discovery of named entities using web search queries

Author:
Marius Paşca

Google Inc., Mountain View, CA

Google Inc., Mountain View, CA
View Profile

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementNovember 2007Pages 683–690https://doi.org/10.1145/1321440.1321536

Published:06 November 2007Publication History

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 683–690

ABSTRACT

A seed-based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized Web search queries. The extraction is guided by a small set of seed named entities, without any need for handcrafted extraction patterns or domain-specific knowledge, allowing for the acquisition of named entities pertaining to various classes of interest to Web search users. Inherently noisy search queries are shown to be a highly valuable, albeit little explored, resource for Web-based named entity discovery.

References

E. Brill and P. Resnik. A transformation-based approach to prepositional phrase attachment disambiguation. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), pages 1198--1204, Kyoto, Japan, 1994. Google ScholarDigital Library
M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005. Google ScholarDigital Library
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 189--196, College Park, Maryland, 1999.Google Scholar
S. Cucerzan and D. Yarowsky. Language independent named entity recognition combining morphological and contextual evidence. In Proceedings of the 1999 Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99), pages 90--99, College Park, Maryland, 1999.Google Scholar
H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002. Google ScholarDigital Library
A. Klementiev and D. Roth. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 817--824, Sydney, Australia, 2006. Google ScholarDigital Library
L. Lee. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics (ACL-99), pages 25--32, College Park, Maryland, 1999. Google ScholarDigital Library
M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006. Google ScholarDigital Library
K. McCarthy and W. Lehnert. Using decision trees for coreference resolution. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), pages 1050--1055, Montreal, Quebec, 1995. Google ScholarDigital Library
R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005. Google ScholarDigital Library
M. Paşca. Acquisition of categorized named entities for Web search. In Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM-04), Washington, D.C., 2004. Google ScholarDigital Library
M. Paşca. Organizing and searching the World Wide Web of facts - step two: Harnessing the wisdom of the crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07), pages 101--110, Banff, Canada, 2007. Google ScholarDigital Library
P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.Google Scholar
E. Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044--1049, Portland, Oregon, 1996. Google ScholarDigital Library
E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 474--479, Orlando, Florida, 1999. Google ScholarDigital Library
L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), Boston, Massachusetts, 2006. Google ScholarDigital Library
Y. Shinyama and S. Sekine. Named entity discovery using comparable news articles. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 848--853, Geneva, Switzerland, 2004. Google ScholarDigital Library
M. Stevenson and R. Gaizauskas. Using corpus-derived name lists for named entity recognition. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), Seattle, Washington, 2000. Google ScholarDigital Library
P. Talukdar, T. Brants, M. Liberman, and F. Pereira. A context pattern induction method for named entity extraction. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), pages 141--148, New York, New York, 2006. Google ScholarDigital Library
M. Thelen and E. Riloff. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-02), pages 214--221, Philadelphia, Pennsylvania, 2002. Google ScholarDigital Library
Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), pages 860--861, Arlington, Virginia, 2006. Google ScholarDigital Library

Index Terms

Weakly-supervised discovery of named entities using web search queries
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the ...
Read More
From names to entities using thematic context distance
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Name ambiguity arises from the polysemy of names and causes uncertainty about the true identity of entities referenced in unstructured text. This is a major problem in areas like information retrieval or knowledge management, for example when searching ...
Read More
Detecting candidate named entities in search queries
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

The information extraction task of Named Entities Recognition (NER) has been recently applied to search engine queries, in order to better understand their semantics. Here we concentrate on the task prior to the classification of the named entities (NEs)...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
knowledge acquisition
named entities
query logs
unstructured text
weakly supervised information extraction
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 90
  Total Citations
  View Citations
- 1,777
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Weakly-supervised discovery of named entities using web search queries

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

From names to entities using thematic context distance

Detecting candidate named entities in search queries