research-article

Free Access

Inducing domain-specific semantic class taggers from (almost) nothing

Authors:
Ruihong Huang

University of Utah, Salt Lake City, UT

University of Utah, Salt Lake City, UT
View Profile

,
Ellen Riloff

University of Utah, Salt Lake City, UT

University of Utah, Salt Lake City, UT
View Profile

Authors Info & Claims

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational LinguisticsJuly 2010Pages 275–285

Published:11 July 2010Publication History

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Pages 275–285

ABSTRACT

This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words. The learning process begins by inducing a classifier that only has access to contextual features, forcing it to generalize beyond the seeds. The contextual classifier then labels new instances, to expand and diversify the training set. Next, a cross-category bootstrapping process simultaneously trains a suite of classifiers for multiple semantic classes. The positive instances for one class are used as negative instances for the others in an iterative bootstrapping cycle. We also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynamically create semantic features. We evaluate our approach by inducing six semantic taggers from a collection of veterinary medicine message board posts.

References

}}ACE. 2005. NIST ACE evaluation website. In http://www.nist.gov/speech/tests/ace/2005.Google Scholar
}}ACE. 2007. NIST ACE evaluation website. In http://www.nist.gov/speech/tests/ace/2007.Google Scholar
}}ACE. 2008. NIST ACE evaluation website. In http://www.nist.gov/speech/tests/ace/2008.Google Scholar
}}Daniel M. Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. 1997. Nymble: a high-performance learning name-finder. In Proceedings of ANLP-97, pages 194--201. Google ScholarDigital Library
}}A. Blum and T. Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-Training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT-98). Google ScholarDigital Library
}}Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2009. Coupling semi-supervised learning of categories and relations. In HLT-NAACL 2009 Workshop on Semi-Supervised Learning for NLP. Google ScholarDigital Library
}}M. Collins and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99).Google Scholar
}}S. Cucerzan and D. Yarowsky. 1999. Language Independent Named Entity Recognition Combining Morphologi cal and Contextual Evidence. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99).Google Scholar
}}O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence, 165(1):91--134, June. Google ScholarDigital Library
}}M. B. Fleischman and E. H. Hovy. 2002. Fine grained classification of named entities. In Proceedings of the COLING conference, August. Google ScholarDigital Library
}}T. Joachims. 1999. Making Large-Scale Support Vector Machine Learning Practical. In A. Smola B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA. Google ScholarDigital Library
}}S. Keerthi and D. DeCoste. 2005. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs. Journal of Machine Learning Research. Google ScholarDigital Library
}}Mamoru Komachi, Taku Kudo, Masashi Shimbo, and Yuji Matsumoto. 2008. Graph-based analysis of semantic drift in espresso-like bootstrapping algorithms. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Google ScholarDigital Library
}}Z. Kozareva, E. Riloff, and E. Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08).Google Scholar
}}D. McClosky, E. Charniak, and M Johnson. 2006. Effective self-training for parsing. In HLT-NAACL-2006. Google ScholarDigital Library
}}T. McIntosh and J. Curran. 2009. Reducing Semantic Drift with Bagging and Distributional Similarity. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
}}R. Mihalcea. 2004. Co-training and Self-training for Word Sense Disambiguation. In CoNLL-2004.Google Scholar
}}G. Miller. 1990. Wordnet: An On-line Lexical Database. International Journal of Lexicography, 3(4).Google Scholar
}}C. Mueller, S. Rapp, and M. Strube. 2002. Applying co-training to reference resolution. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
}}V. Ng and C. Cardie. 2003. Weakly supervised natural language learning without redundant views. In HLT-NAACL-2003. Google ScholarDigital Library
}}V. Ng. 2007. Semantic Class Induction and Coreference Resolution. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.Google Scholar
}}Cheng Niu, Wei Li, Jihong Ding, and Rohini K. Srihari. 2003. A bootstrapping approach to named entity classification using successive learners. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL-03), pages 335--342. Google ScholarDigital Library
}}M. Paşca. 2004. Acquisition of categorized named entities for web search. In Proc. of the Thirteenth ACM International Conference on Information and Knowledge Management, pages 137--145. Google ScholarDigital Library
}}W. Phillips and E. Riloff. 2002. Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pages 125--132. Google ScholarDigital Library
}}E. Riloff and R. Jones. 1999. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence. Google ScholarDigital Library
}}E. Riloff and J. Shepherd. 1997. A Corpus-Based Approach for Building Semantic Lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117--124.Google Scholar
}}B. Roark and E. Charniak. 1998. Noun-phrase Co-occurrence Statistics for Semi-automatic Semantic Lexicon Construction. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, pages 1110--1116. Google ScholarDigital Library
}}M. Thelen and E. Riloff. 2002. A Bootstrapping Method for Learning Semantic Lexicons Using Extraction Pa ttern Contexts. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pages 214--221. Google ScholarDigital Library
}}K. Toutanova, D. Klein, C. Manning, and Y. Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003. Google ScholarDigital Library
}}R. Yangarber. 2003. Counter-training in the discovery of semantic patterns. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
}}D. Yarowsky. 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
}}Imed Zitouni and Radu Florian. 2009. Cross-language information propagation for arabic mention detection. ACM Transactions on Asian Language Information Processing (TALIP), 8(4):1--21. Google ScholarDigital Library

Index Terms

Inducing domain-specific semantic class taggers from (almost) nothing
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Bootstrapping POS taggers using unlabelled data
CONLL '03: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

This paper investigates booststrapping part-of-speech taggers using co-training, in which two taggers are iteratively re-trained on each other's output. Since the output of the taggers is noisy, there is a question of which newly labelled examples to ...
Read More
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies

This paper investigates the potential for projecting linguistic annotations including part-of-speech tags and base noun phrase bracketings from one language to another via automatically word-aligned parallel corpora. First, experiments assess the ...
Read More
Inducing semantic frames from lexical resources
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
July 2010
1618 pages
Program Chair:
Jan Hajič
Charles University in Prague, Czech Republic
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 July 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 232
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Inducing domain-specific semantic class taggers from (almost) nothing

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bootstrapping POS taggers using unlabelled data

Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

Inducing semantic frames from lexical resources

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Inducing domain-specific semantic class taggers from (almost) nothing

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bootstrapping POS taggers using unlabelled data

Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

Inducing semantic frames from lexical resources

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media