ABSTRACT
We present a named entity recognition (NER) system for extracting product attributes and values from listing titles. Information extraction from short listing titles present a unique challenge, with the lack of informative context and grammatical structure. In this work, we combine supervised NER with bootstrapping to expand the seed list, and output normalized results. Focusing on listings from eBay's clothing and shoes categories, our bootstrapped NER system is able to identify new brands corresponding to spelling variants and typographical errors of the known brands, as well as identifying novel brands. Among the top 300 new brands predicted, our system achieves 90.33% precision. To output normalized attribute values, we explore several string comparison algorithms and found n-gram substring matching to work well in practice.
- A. Berger, S. Pietra, V. Pietra, A Maximum Entropy Approach to Natural Language Processing, ACL 1996.Google Scholar
- S. Brody, N. Elhadad, An Unsupervised Aspect-Sentiment Model for Online Reviews, HLT-NAACL 2010. Google ScholarDigital Library
- P. Brown, P. deSouza, R. Mercer, V. Della Pietra, J. Lai, Class-based n-gram Models of Natural Language, ACL 1992.Google Scholar
- C.-C Chang, C.-J. Lin, LibSVM: A Library for Support Vector Machines (2001).Google Scholar
- H. L. Chieu, H. T. Ng, Named Entity Recognition with a Maximum Entropy Approach, ACL 2003. Google ScholarDigital Library
- A. Clark, Combining Distributional and Morphological Information for Part of Speech Induction, EACL 2003 Google ScholarDigital Library
- G. Demartini, C. S. Firan, M. Georgescu, T. Iofciu, R. Krestel, and W. Nejdl, An Architecture for Finding Entities on the web, Latin American Web Congress 2009. Google ScholarDigital Library
- J. Du, Z. Zhang, J. Yan, Y. Cui, and Z. Chen. Using search session context for named entity recognition in query. In SIGIR10, Geneva, Switzerland, July 19--23 2010. Google ScholarDigital Library
- Asif Ekbal, Rejwanul Haque, and Sivaji Bandyopadhyay. 2008. Named entity recognition in Bengali: A conditional random field approach. In Proceedings of IJC-NLP, pages 589594.Google Scholar
- M. Faruqui, S. Pado, Training and Evaluating a German Named Entity Recognizer with Semantic Generalization, Proceedings of Konvens 2010, Saarbrucken, Germany.Google Scholar
- F. Feng, A. McCallum, Chinese segmentation and new word detection using conditional random fields, in COLING 2004. Google ScholarDigital Library
- J. R. Finkel, T. Grenager, and C. Manning, Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, ACL 2005. Google ScholarDigital Library
- J. R. Finkel, C. Manning, Nested Named Entity Recognition, EMNLP 2009. Google ScholarDigital Library
- R. Ghani, K. Probst, Y. Liu, M. Krema, A. Fano, Text Mining for Product Attribute Extraction, SIGKDD, 2006. Google ScholarDigital Library
- R. Ghani, R. Jones, A comparison of efficacy and assumptions of bootstrapping algorithms for training information extraction systems, Workshop on Linguistic Knowledge Acquisition and Representation at the Third International Conference on Language Resources and Evaluation (LREC), 2002.Google Scholar
- T. Grenager, D. Klein, and C. D. Manning, Unsupervised Learning of Field Segmentation Models for Information Extraction, ACL 2005. Google ScholarDigital Library
- D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, and A. Sheth. Context and Domain Knowledge Enhanced Entity Spotting In Informal Text. In Proceedings of the 8th International Semantic Web Conference (ISWC 2009). Springer, 2009. Google ScholarDigital Library
- A. D. Haghighi, Unsupervised Models of Entity Reference Resolution, Ph. D. Thesis, University of Calfornia, Berkeley, 2010. Google ScholarDigital Library
- P. Halacsy, A. Kornai, C. Oravecz, HunPos: an open source trigram tagger, ACL 2007. Google ScholarDigital Library
- H. Isozaki and H. Kazawa, Efficient Support Vector Classifiers for Named Entity Recognition, ACL 2002. Google ScholarDigital Library
- R. Jones, Learning to Extract Entities from Labeled and Unlabeled Text, PhD Thesis, 2005.Google Scholar
- I. Kanaris, K. Kanaris, I. Houvardas, E. Stamatatos, Words vs. Character N-grams for Anti-spam Filtering, International Journal on Artificial Intelligence Tools, 2006.Google Scholar
- D. Klein, J. Smarr, H. Nguyen, C. Manning, Named Entity Recognition with Character-level Models, CoNLL 2003. Google ScholarDigital Library
- R. Koeling, Chunking with Maximum Entropy Models, Proc. of CoNLL-2000. Google ScholarDigital Library
- G. Kondrak, N-Gram Similarity and Distance, SPIRE 2005. Google ScholarDigital Library
- V. Krishnan and C. D. Manning, An effective two-stage model for exploiting non-local dependencies in named entity recognition, in ACL-COLING, 2006. Google ScholarDigital Library
- T. Kudo, Y. Matsumoto, Chunking with Support Vector Machines, ACL 2001. Google ScholarDigital Library
- J. Lafferty, A. McCallum, F. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2002. Google ScholarDigital Library
- V. I. Levenshtein, Binary code capable of correcting deletions, insertions, and reversals. Phs. Dokl., 6:707--710.Google Scholar
- D. Lin, X. Wu, Phrase Clustering for Discriminative Learning, ACL 2009. Google ScholarDigital Library
- B. Liu, M. Hu, and J. Cheng, Opinion Observer: Analyzing and Comparing Opinions on the Web, WWW 2005. Google ScholarDigital Library
- Xinnian Mao, Saike He, Sencheng Bao, Yuan Dong, and Haila Wang, Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields, Sixth SIGHAN Workshop on Chinese Language Processing, 2008Google Scholar
- A. McCallum, Efficiently Inducing Features of Conditional Random Fields, UAI 2003. Google ScholarDigital Library
- A. McCallum, D. Jensen, A Note on Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models, Proceedings of IJCAI-2003 on Learning Statistical Models from Relational Data, 2003.Google Scholar
- J. F. McCarthy, A Trainable Approach to Coreference Resolution for Information Extraction, Ph. D. Thesis, University of Massachusetts at Amherst, 1996. Google ScholarDigital Library
- E. Minkov, R. C. Wang, and W. W. Cohen, Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text, ACL 2005. Google ScholarDigital Library
- Mike Mintz, Steven Bills, Rion Snow, Daniel Jurafsky. 2009. Distant Supervision for Relation Extraction without Labeled Data, In Proceedings of ACL/AFNLP 2009. Google ScholarDigital Library
- S. Moghaddam, M. Ester, Opinion Digger: An Unsupervised Opinion Miner from Unstructured Product Reviews, CIKM 2010 Google ScholarDigital Library
- David Nadeau, P. Turney, S. Matwin, Unsupervised Named Entity Recognition: Generating Gazetteers and Resolving Ambiguity. In Proc. Canadian Conference on Artificial Intelligence, 2006.Google Scholar
- David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1):326, 2007.Google Scholar
- Nadeau, D., Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision, PhD thesis, University of Ottawa, 2007. Google ScholarDigital Library
- S. Pakhomov, Semi-supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts, ACL 2002. Google ScholarDigital Library
- A.-M. Popescu, O. Etzioni, Extracting Product Features and Opinions from Reviews, EMNLP 2005. Google ScholarDigital Library
- K. Probst, R. Ghani, M. Krema, A. Fano, Semi-Supervised Learning to Extract Attribute-Value Pairs from Product Descriptions on the Web, ECML 2006.Google Scholar
- V. Punyakanok, D. Roth, The use of classifiers in sequential inference, NIPS 2001.Google Scholar
- H. Raghavan, J. Allan, Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval, HLT-EMNLP 2005. Google ScholarDigital Library
- A. Ratnaparkhi, A Maximum Entropy Part of Speech Tagger. In EMNLP 1996.Google Scholar
- A. Ratnaparkhi, Maximum Entropy Models for Natural Language Ambiguity Resolution, Ph. D. Thesis, University of Pennsylvania. Google ScholarDigital Library
- E. Riloff, R. Jones, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, AAAI 1999. Google ScholarDigital Library
- Settles, B. (2004), Biomedical named entity recognition using conditional random fields and rich feature sets, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 2004, Geneva, Switzerland. Google ScholarDigital Library
- W. M. Soon, H. T. Ng, D. Chung, Y. Lim, A machine learning approach to coreference resolution of noun phrases, Computational Linguistics, 27(4): 521--544, 2001. Google ScholarDigital Library
- H. Wallach, Efficient Training of Conditional Random Fields, M. Sc. Thesis, Division of Informatics, University of Edinburgh, 2002.Google Scholar
- D. Wu, W. S. Lee, N. Ye, and H. L. Chieu, Domain adaptive bootstrapping for named entity recognition, EMNLP 2009. Google ScholarDigital Library
- Y. Zhao, B. Qin, S. Hu, T. Liu, Generalizing Syntactic Structures for Product Attribute Candidate Extraction, ACL 2010 Google ScholarDigital Library
- Bootstrapped named entity recognition for product attribute extraction
Recommendations
Improved Named Entity Translation and Bilingual Named Entity Extraction
ICMI '02: Proceedings of the 4th IEEE International Conference on Multimodal InterfacesTranslation of named entities (NE), including proper names, temporal and numerical expressions, is very important in multilingual natural language processing, like crosslingual information retrieval and statistical machine translation. In this paper we ...
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic ResourcesNamed entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
A joint named entity recognition and entity linking system
HYBRID '12: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual DataWe present a joint system for named entity recognition (NER) and entity linking (EL), allowing for named entities mentions extracted from textual data to be matched to uniquely identifiable entities. Our approach relies on combined NER modules which ...
Comments