ABSTRACT
Discriminative models have been of interest in the NLP community in recent years. Previous research has shown that they are advantageous over generative models. In this paper, we investigate how different objective functions and optimization methods affect the performance of the classifiers in the discriminative learning framework. We focus on the sequence labelling problem, particularly POS tagging and NER tasks. Our experiments show that changing the objective function is not as effective as changing the features included in the model.
- Y. Altun, T. Hofmann, and M. Johnson. 2002. Discriminative learning for label sequences via boosting. In Proceedings of NIPS*15.Google Scholar
- Daniel M. Bikel, Richard L. Schwartz, and Ralph M. Weischedel. 1999. An algorithm that learns what's in a name. Machine Learning, 34(1--3):211--231. Google ScholarDigital Library
- M. Collins. 2000. Discriminative reranking for natural language parsing. In Proceedings of ICML 2002. Google ScholarDigital Library
- M. Collins. 2002. Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In Proceedings of ACL'02. Google ScholarDigital Library
- M. Johnson, S. Geman, S. Canon, Z. Chi, and S. Riezler. 1999. Estimators for stochastic unifi cation-based grammars. In Proceedings of ACL'99. Google ScholarDigital Library
- S. Kakade, Y. W. Teh, and S. Roweis. 2002. An alternative objective function for Markovian fi elds. In Proceedings of ICML 2002. Google ScholarDigital Library
- Dan Klein and Christopher D. Manning. 2002. Conditional structure versus conditional estimation in nip models. In Proceedings of EMNLP 2002. Google ScholarDigital Library
- J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML2001. Google ScholarDigital Library
- A. McCallum, D. Freitag, and F. Pereira. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proceedings of ICML 2000. Google ScholarDigital Library
- T. Minka. 2001. Algorithms for maximum-likelihood logistic regression. Technical report, CMU, Department of Statistics, TR 758.Google Scholar
- V. Punyakanok and D. Roth. 2000. The use of classifiers in sequential inference. In Proceedings of NIPS*13.Google Scholar
- Adwait Ratnaparkhi. 1999. Learning to parse natural language with maximum entropy models. Machine Learning, 34(1-3):151--175. Google ScholarDigital Library
- R. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297--336. Google ScholarDigital Library
- Kristina Toutanova and Christopher Manning. 2000. Enriching the knowledge sources used in a maximum entropy pos tagger. In Proceedings of EMNLP 2000. Google ScholarDigital Library
Recommendations
Transductive Multilabel Learning via Label Set Propagation
The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Confidence-rated discriminative partial label learning
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial IntelligencePartial label learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one label is valid. The common discriminative solution to learn from partial label ...
Semi-supervised partial label learning algorithm via reliable label propagation
AbstractPartial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...
Comments