research-article

Free Access

Bootstrapped named entity recognition for product attribute extraction

Authors:
Duangmanee (Pew) Putthividhya

eBay Inc., San Jose, CA

eBay Inc., San Jose, CA
View Profile

,
Junling Hu

eBay Inc., San Jose, CA

eBay Inc., San Jose, CA
View Profile

Authors Info & Claims

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingJuly 2011Pages 1557–1567

Published:27 July 2011Publication History

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Pages 1557–1567

ABSTRACT

We present a named entity recognition (NER) system for extracting product attributes and values from listing titles. Information extraction from short listing titles present a unique challenge, with the lack of informative context and grammatical structure. In this work, we combine supervised NER with bootstrapping to expand the seed list, and output normalized results. Focusing on listings from eBay's clothing and shoes categories, our bootstrapped NER system is able to identify new brands corresponding to spelling variants and typographical errors of the known brands, as well as identifying novel brands. Among the top 300 new brands predicted, our system achieves 90.33% precision. To output normalized attribute values, we explore several string comparison algorithms and found n-gram substring matching to work well in practice.

References

A. Berger, S. Pietra, V. Pietra, A Maximum Entropy Approach to Natural Language Processing, ACL 1996.Google Scholar
S. Brody, N. Elhadad, An Unsupervised Aspect-Sentiment Model for Online Reviews, HLT-NAACL 2010. Google ScholarDigital Library
P. Brown, P. deSouza, R. Mercer, V. Della Pietra, J. Lai, Class-based n-gram Models of Natural Language, ACL 1992.Google Scholar
C.-C Chang, C.-J. Lin, LibSVM: A Library for Support Vector Machines (2001).Google Scholar
H. L. Chieu, H. T. Ng, Named Entity Recognition with a Maximum Entropy Approach, ACL 2003. Google ScholarDigital Library
A. Clark, Combining Distributional and Morphological Information for Part of Speech Induction, EACL 2003 Google ScholarDigital Library
G. Demartini, C. S. Firan, M. Georgescu, T. Iofciu, R. Krestel, and W. Nejdl, An Architecture for Finding Entities on the web, Latin American Web Congress 2009. Google ScholarDigital Library
J. Du, Z. Zhang, J. Yan, Y. Cui, and Z. Chen. Using search session context for named entity recognition in query. In SIGIR10, Geneva, Switzerland, July 19--23 2010. Google ScholarDigital Library
Asif Ekbal, Rejwanul Haque, and Sivaji Bandyopadhyay. 2008. Named entity recognition in Bengali: A conditional random field approach. In Proceedings of IJC-NLP, pages 589594.Google Scholar
M. Faruqui, S. Pado, Training and Evaluating a German Named Entity Recognizer with Semantic Generalization, Proceedings of Konvens 2010, Saarbrucken, Germany.Google Scholar
F. Feng, A. McCallum, Chinese segmentation and new word detection using conditional random fields, in COLING 2004. Google ScholarDigital Library
J. R. Finkel, T. Grenager, and C. Manning, Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, ACL 2005. Google ScholarDigital Library
J. R. Finkel, C. Manning, Nested Named Entity Recognition, EMNLP 2009. Google ScholarDigital Library
R. Ghani, K. Probst, Y. Liu, M. Krema, A. Fano, Text Mining for Product Attribute Extraction, SIGKDD, 2006. Google ScholarDigital Library
R. Ghani, R. Jones, A comparison of efficacy and assumptions of bootstrapping algorithms for training information extraction systems, Workshop on Linguistic Knowledge Acquisition and Representation at the Third International Conference on Language Resources and Evaluation (LREC), 2002.Google Scholar
T. Grenager, D. Klein, and C. D. Manning, Unsupervised Learning of Field Segmentation Models for Information Extraction, ACL 2005. Google ScholarDigital Library
D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, and A. Sheth. Context and Domain Knowledge Enhanced Entity Spotting In Informal Text. In Proceedings of the 8th International Semantic Web Conference (ISWC 2009). Springer, 2009. Google ScholarDigital Library
A. D. Haghighi, Unsupervised Models of Entity Reference Resolution, Ph. D. Thesis, University of Calfornia, Berkeley, 2010. Google ScholarDigital Library
P. Halacsy, A. Kornai, C. Oravecz, HunPos: an open source trigram tagger, ACL 2007. Google ScholarDigital Library
H. Isozaki and H. Kazawa, Efficient Support Vector Classifiers for Named Entity Recognition, ACL 2002. Google ScholarDigital Library
R. Jones, Learning to Extract Entities from Labeled and Unlabeled Text, PhD Thesis, 2005.Google Scholar
I. Kanaris, K. Kanaris, I. Houvardas, E. Stamatatos, Words vs. Character N-grams for Anti-spam Filtering, International Journal on Artificial Intelligence Tools, 2006.Google Scholar
D. Klein, J. Smarr, H. Nguyen, C. Manning, Named Entity Recognition with Character-level Models, CoNLL 2003. Google ScholarDigital Library
R. Koeling, Chunking with Maximum Entropy Models, Proc. of CoNLL-2000. Google ScholarDigital Library
G. Kondrak, N-Gram Similarity and Distance, SPIRE 2005. Google ScholarDigital Library
V. Krishnan and C. D. Manning, An effective two-stage model for exploiting non-local dependencies in named entity recognition, in ACL-COLING, 2006. Google ScholarDigital Library
T. Kudo, Y. Matsumoto, Chunking with Support Vector Machines, ACL 2001. Google ScholarDigital Library
J. Lafferty, A. McCallum, F. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2002. Google ScholarDigital Library
V. I. Levenshtein, Binary code capable of correcting deletions, insertions, and reversals. Phs. Dokl., 6:707--710.Google Scholar
D. Lin, X. Wu, Phrase Clustering for Discriminative Learning, ACL 2009. Google ScholarDigital Library
B. Liu, M. Hu, and J. Cheng, Opinion Observer: Analyzing and Comparing Opinions on the Web, WWW 2005. Google ScholarDigital Library
Xinnian Mao, Saike He, Sencheng Bao, Yuan Dong, and Haila Wang, Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields, Sixth SIGHAN Workshop on Chinese Language Processing, 2008Google Scholar
A. McCallum, Efficiently Inducing Features of Conditional Random Fields, UAI 2003. Google ScholarDigital Library
A. McCallum, D. Jensen, A Note on Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models, Proceedings of IJCAI-2003 on Learning Statistical Models from Relational Data, 2003.Google Scholar
J. F. McCarthy, A Trainable Approach to Coreference Resolution for Information Extraction, Ph. D. Thesis, University of Massachusetts at Amherst, 1996. Google ScholarDigital Library
E. Minkov, R. C. Wang, and W. W. Cohen, Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text, ACL 2005. Google ScholarDigital Library
Mike Mintz, Steven Bills, Rion Snow, Daniel Jurafsky. 2009. Distant Supervision for Relation Extraction without Labeled Data, In Proceedings of ACL/AFNLP 2009. Google ScholarDigital Library
S. Moghaddam, M. Ester, Opinion Digger: An Unsupervised Opinion Miner from Unstructured Product Reviews, CIKM 2010 Google ScholarDigital Library
David Nadeau, P. Turney, S. Matwin, Unsupervised Named Entity Recognition: Generating Gazetteers and Resolving Ambiguity. In Proc. Canadian Conference on Artificial Intelligence, 2006.Google Scholar
David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1):326, 2007.Google Scholar
Nadeau, D., Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision, PhD thesis, University of Ottawa, 2007. Google ScholarDigital Library
S. Pakhomov, Semi-supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts, ACL 2002. Google ScholarDigital Library
A.-M. Popescu, O. Etzioni, Extracting Product Features and Opinions from Reviews, EMNLP 2005. Google ScholarDigital Library
K. Probst, R. Ghani, M. Krema, A. Fano, Semi-Supervised Learning to Extract Attribute-Value Pairs from Product Descriptions on the Web, ECML 2006.Google Scholar
V. Punyakanok, D. Roth, The use of classifiers in sequential inference, NIPS 2001.Google Scholar
H. Raghavan, J. Allan, Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval, HLT-EMNLP 2005. Google ScholarDigital Library
A. Ratnaparkhi, A Maximum Entropy Part of Speech Tagger. In EMNLP 1996.Google Scholar
A. Ratnaparkhi, Maximum Entropy Models for Natural Language Ambiguity Resolution, Ph. D. Thesis, University of Pennsylvania. Google ScholarDigital Library
E. Riloff, R. Jones, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, AAAI 1999. Google ScholarDigital Library
Settles, B. (2004), Biomedical named entity recognition using conditional random fields and rich feature sets, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 2004, Geneva, Switzerland. Google ScholarDigital Library
W. M. Soon, H. T. Ng, D. Chung, Y. Lim, A machine learning approach to coreference resolution of noun phrases, Computational Linguistics, 27(4): 521--544, 2001. Google ScholarDigital Library
H. Wallach, Efficient Training of Conditional Random Fields, M. Sc. Thesis, Division of Informatics, University of Edinburgh, 2002.Google Scholar
D. Wu, W. S. Lee, N. Ye, and H. L. Chieu, Domain adaptive bootstrapping for named entity recognition, EMNLP 2009. Google ScholarDigital Library
Y. Zhao, B. Qin, S. Hu, T. Liu, Generalizing Syntactic Structures for Product Attribute Candidate Extraction, ACL 2010 Google ScholarDigital Library

Bootstrapped named entity recognition for product attribute extraction
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Improved Named Entity Translation and Bilingual Named Entity Extraction
ICMI '02: Proceedings of the 4th IEEE International Conference on Multimodal Interfaces

Translation of named entities (NE), including proper names, temporal and numerical expressions, is very important in multilingual natural language processing, like crosslingual information retrieval and statistical machine translation. In this paper we ...
Read More
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources

Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Read More
A joint named entity recognition and entity linking system
HYBRID '12: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

We present a joint system for named entity recognition (NER) and entity linking (EL), allowing for named entities mentions extracted from textual data to be matched to uniquely identifiable entities. Our approach relies on combined NER modules which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing
July 2011
1647 pages
ISBN:9781937284114
General Chair:
Paola Merlo
University of Geneva
,
Program Chairs:
Regina Barzilay
Massachusetts Institute of Technology
,
Mark Johnson
Macquarie University
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 27 July 2011
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 1,064
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bootstrapped named entity recognition for product attribute extraction

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Improved Named Entity Translation and Bilingual Named Entity Extraction

Named entity recognition in Wikipedia

A joint named entity recognition and entity linking system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Bootstrapped named entity recognition for product attribute extraction

EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Improved Named Entity Translation and Bilingual Named Entity Extraction

Named entity recognition in Wikipedia

A joint named entity recognition and entity linking system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media