ABSTRACT
We present an automatic approach to learning criteria for classifying the parts-of-speech used in lexical mappings. This will further automate our knowledge acquisition system for non-technical users. The criteria for the speech parts are based on the types of the denoted terms along with morphological and corpus-based clues. Associations among these and the parts-of-speech are learned using the lexical mappings contained in the Cyc knowledge base as training data. With over 30 speech parts to choose from, the classifier achieves good results (77.8% correct). Accurate results (93.0%) are achieved in the special case of the mass-count distinction for nouns. Comparable results are also obtained using OpenCyc (73.1% general and 88.4% mass-count).
- Timothy Baldwin and Francis Bond. 2003. Learning the countability of English nouns from corpus data. In Proc. ACL-03. Google ScholarDigital Library
- Francis Bond and Caitlin Vatikiotis-Bateson. 2002. Using an ontology to determine English countability. In Proc. COLING-2002, pages 99--105. Taipei. Google ScholarDigital Library
- Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4):543--565. Google ScholarDigital Library
- Kathy J. Burns and Anthony B. Davis. 1999. Building and maintaining a semantically adequate lexicon using Cyc. In Evelyn Viegas, editor, Breadth and Depth of Semantic Lexicons, pages 121--143. Kluwer, Dordrecht.Google Scholar
- Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of EACL 2003. Google ScholarDigital Library
- Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing. Prentice Hall, Upper Saddle River, New Jersey. Google ScholarDigital Library
- D. B. Lenat. 1995. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11). Google ScholarDigital Library
- Tom O'Hara, Nancy Salay, Michael Witbrock, Dave Schneider, Bjoern Aldag, Stefano Bertolo, Kathy Panton, Fritz Lehmann, Matt Smith, David Baxter, Jon Curtis, and Peter Wagner. 2003. Inducing criteria for mass noun lexical mappings using the Cyc KB, and its extension to WordNet. In Proc. Fifth International Workshop on Computational Semantics (IWCS-5).Google Scholar
- B. Onyshkevych and S. Nirenburg. 1995. A lexicon for knowledge-based MT. Machine Translation, 10(2):5--57.Google ScholarCross Ref
- Ted Pedersen and Weidong Chen. 1995. Lexical acquisition via constraint solving. In Proc. AAAI 1995 Spring Symposium Series.Google Scholar
- Paul Procter, editor. 1995. Cambridge International Dictionary of English. Cambridge University Press, Cambridge.Google Scholar
- J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California. Google ScholarDigital Library
- Lane O. B. Schwartz. 2002. Corpus-based acquisition of head noun countability features. Master's thesis, Cambridge University, Cambridge, UK.Google Scholar
- Janine Toole. 2000. Categorizing unknown words: Using decision trees to identify names and misspellings. In Proc. ANLP-2000. Google ScholarDigital Library
- Ian H. Witten and Eibe Frank. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- W. Woods. 2000. Aggressive morphology for robust lexical coverage. In Proc. ANLP-00. Google ScholarDigital Library
- Inferring parts of speech for lexical mappings via the Cyc KB
Recommendations
Inferring selectional preferences from part-of-speech N-grams
EACL '12: Proceedings of the 13th Conference of the European Chapter of the Association for Computational LinguisticsWe present the PONG method to compute selectional preferences using part-of-speech (POS) N-grams. From a corpus labeled with grammatical dependencies, PONG learns the distribution of word relations for each POS N-gram. From the much larger but unlabeled ...
Tagging Urdu text with parts of speech: a tagger comparison
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational LinguisticsIn this paper, four state-of-art probabilistic taggers i.e. TnT tagger, TreeTagger, RF tagger and SVM tool, are applied to the Urdu language. For the purpose of the experiment, a syntactic tagset is proposed. A training corpus of 100,000 tokens is used ...
Decision Tree Ensemble for Parts-of-Speech Tagging of Resource-poor Languages
FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval EvaluationEnsemble POS taggers are a good choice to integrate and leverage benefits of various types of POS taggers. This can help the large number (6500+) of resource-poor languages which do not have much annotated training data by providing ways to integrate ...
Comments