ABSTRACT
Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two popular discriminative models, namely the maximum entropy model and support vector machines with that of language modeling, the state-of-the-art generative model for IR. Our experiments on ad-hoc retrieval indicate that although maximum entropy is significantly worse than language models, support vector machines are on par with language models. We argue that the main reason to prefer SVMs over language models is their ability to learn arbitrary features automatically as demonstrated by our experiments on the home-page finding task of TREC-10.
- Berger, A. L., Della Pietra, D., Stephen A. and Della Pietra, V. J., A Maximum Entropy Approach to Natural Language Processing, Computational Linguistics, vol. 22(1), p39--71, 1996.]] Google ScholarDigital Library
- Burges, C., A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, vol. 2(2), p121--167, 1998.]] Google ScholarDigital Library
- Cooper, W. S. and Huizinga, P., The maximum entropy principle and its application to the design of probabilistic retrieval systems, Information Technology, Research and Development, 1:99--112, 1982.]]Google Scholar
- Cooper, W. S., Exploiting the maximum entropy principle to increase retrieval effectiveness, Journal of the American Society for Information Science, 34(1):31--39, 1983.]]Google ScholarCross Ref
- Cooper, W. S., Gey, F. and Dabney, D., Probabilistic Retrieval based on Staged Logistic regression, ACM SIGIR, p198--210, 1992.]] Google ScholarDigital Library
- Craswell, N., Home-page finding training queries, http://es.cmis.csiro.au/TRECWeb/Qrels/homepages.wt10g.training01.]]Google Scholar
- Gey, F., Inferring probability of relevance using the method of logistic regression, ACM SIGIR, p222--231, 1994.]] Google ScholarDigital Library
- Greiff, W. R. and Ponte, J. M., The maximum entropy approach and probabilistic IR models, ACM Trans. on Information Systems, 18(3):246--287, 2000.]] Google ScholarDigital Library
- Harter, S. P., A probabilistic approach to automatic keyword indexing. Part I: On the distribution of specialty words in a technical literature, Journal of the ASIS, vol. 26, 197--206.]]Google Scholar
- Hawking, D. and Craswell, N., Overview of the TREC-2001 web track, TREC proceedings, 2001.]]Google Scholar
- Kantor P. B. and Lee, J. J., The maximum entropy principle in information retrieval, SIGIR, 1986.]] Google ScholarDigital Library
- Joachims, T., Text categorization with support vector machines: learning with many relevant features, Proceedings of 10th European Conference on Machine Learning, p137--142, 1998.]] Google ScholarDigital Library
- Kantor P. B. and Lee, J. J., Testing the maximum entropy principle for information retrieval, Journal of the American Society for Information Science, 49(6):557--566, 1998.]] Google ScholarDigital Library
- Kraaij, W., Westerveld T. and Hiemstra, D., The importance of prior probabilities for entry page search, SIGIR, pages 27--34, 2002.]] Google ScholarDigital Library
- Lafferty, J. and Zhai, C., Probabilistic relevance models based on document and query generation, Workshop on Language Modeling and Information Retrieval, 2001.]]Google Scholar
- Joachims, T., Making large-Scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola(ed.), MIT-Press, 1999.]] Google ScholarDigital Library
- Malouf, R., A comparison of algorithms for maximum entropy parameter estimation, http://citeseer.nj.nec.com/malouf02comparison.html.]]Google Scholar
- Nallapati, R. and Allan, J., Capturing Term Dependencies using a Sentence Tree based Language Model, CIKM, 2002.]] Google ScholarDigital Library
- Ng., A. and Jordan, M., On Discriminative vs. Generative classifiers: A comparison of logistic regression and naïve Bayes, Neural Information Processing Systems, 2002.]]Google Scholar
- Nigam, K., Lafferty, J. and McCallum, A., Using maximum entropy for text classification, IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.]]Google Scholar
- Ogilvie, P., and Callan J., Combining Document Representations for Known Item Search, SIGIR, 2003.]] Google ScholarDigital Library
- Page, L., Brin, S., Motwani, R. and Winograd, T., The PageRank Citation Ranking: Bringing Order to the Web, Stanford Digital Library Technologies Project, 1998.]]Google Scholar
- Ponte, J. M. and Croft, W. B., A Language Modeling Approach to Information Retrieval, ACM SIGIR, 275--281, 1998.]] Google ScholarDigital Library
- Ratnaparkhi, A., A Maximum Entropy Part-Of-Speech Tagger, Empirical Methods in Natural Language Processing, 1996.]]Google Scholar
- Robertson S. E. and Sparck Jones, K., Relevance weighting of search terms, Journal of American Society for Information Sciences, 27(3):129--146, 1976.]]Google ScholarCross Ref
- Robertson, S. E., On Bayesian models and event spaces in information retrieval, Workshop on Mathematical and Formal methods for IR, 2002.]]Google Scholar
- Robertson, S. E., van Rijsbergen, C.J., and Porter, M. F., Probabilistic models of indexing and searching, Proceedings of SIGIR, 1980.]] Google ScholarDigital Library
- Salton, G., The SMART Retrieval System - Experiments in Automatic Document Processing, Prentice hall Inc., Englewood Cliffs, NJ, 1971.]] Google ScholarDigital Library
- Teevan, J. and Karger, D., Empirical Development of an Exponential Probabilistic Model for Text Retrieval: Using Textual Analysis to Build a Better Model, In Proceedings of the 26th Annual ACM Conference on Research and Development in Information Retrieval, 2003.]] Google ScholarDigital Library
- Vapnik, V. N., Statistical Learning Theory, John Wiley & Sons, 1998.]] Google ScholarDigital Library
- Zhai, C. and Lafferty, J., A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval, SIGIR, 2001.]] Google ScholarDigital Library
- Zhang, J. and Mani, I., kNN approach to unbalanced data distributions: A case study involving Information Extraction, Workshop on learning from imbalanced datasets II, ICML, 2003.]]Google Scholar
- Zhang, L., A Maximum Entropy Modeling Toolkit for Python and C++, http://www.nlplab.cn/zhangle/maxent.html.]]Google Scholar
- Language Modeling Toolkit for Information Retrieval, http://www-2.cs.cmu.edu/lemur/.]]Google Scholar
Index Terms
- Discriminative models for information retrieval
Recommendations
Twin Support Vector Machines for Pattern Classification
We propose Twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The Twin SVM formulation is in the spirit of proximal SVMs via generalized ...
Fuzzy support vector machines for multilabel classification
The problem of one-against-all support vector machines (SVMs) for multilabel classification is that a data sample may be classified into a multilabel class that is not defined or it may not be classified into any class. To solve this problem, in this ...
Extending twin support vector machine classifier for multi-category classification problems
Twin support vector machine classifier TWSVM was proposed by Jayadeva et al., which was used for binary classification problems. TWSVM not only overcomes the difficulties in handling the problem of exemplar unbalance in binary classification problems, ...
Comments