ABSTRACT
Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust classification accuracy. Combining multiple classifiers (a.k.a. Ensemble Learning) is a well studied and has been known to improve effectiveness of a classifier. To address two key challenges in Ensemble Learning-- (1) learning weights of individual classifiers and (2) the combination rule of their weighted responses, this paper proposes a novel Ensemble classifier, EnLR, that computes weights of responses from discriminative classifiers and combines their weighted responses to produce a single response for a test instance. The combination rule is based on aggregating weighted responses, where a weight of an individual classifier is inversely based on their respective variances around their responses. Here, variance quantifies the uncertainty of the discriminative classifiers' parameters, which in turn depends on the training samples. As opposed to other ensemble methods where the weight of each individual classifier is learned as a part of parameter learning and thus the same weight is applied to all testing instances, our model is actively adjusted as individual classifiers become confident at its decision for a test instance. Our empirical experiments on various data sets demonstrate that our combined classifier produces "effective" results when compared with a single classifier. Our novel classifier shows statistically significant better accuracy when compared to well known Ensemble methods -- Bagging and AdaBoost. In addition to robust accuracy, our model is extremely efficient dealing with high volumes of training samples due to the independent learning paradigm among its multiple classifiers. It is simple to implement in a distributed computing environment such as Hadoop.
Supplemental Material
- T. Amemiya. Introduction to Statistics and Econometrics. Harvard University Press, 1994.Google Scholar
- A. Asuncion and D. Newman. UCI machine learning repository, 2007.Google Scholar
- L. Breiman and L. Breiman. Bagging predictors. In Machine Learning, pages 123--140, 1996. Google ScholarDigital Library
- H. Cao, D. H. Hu, D. Shen, D. Jiang, J.-T. Sun,E. Chen, and Q. Yang. Context-aware query classification. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- R. Frank, M. Ester, and A. Knobbe. A multi-relational approach to spatial classification. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 309--318, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm, 1996.Google Scholar
- J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.Google ScholarCross Ref
- J. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4):367--378, 2002. Google ScholarDigital Library
- A. Fuxman, A. Kannan, A. B. Goldberg, R. Agrawal, P. Tsaparas, and J. Shafer. Improving classification accuracy using automatically extracted training data. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1145--1154, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- S. Ghosal. Dirichlet process, related priors and posterior asymptotics. Bayesian Nonparametrics in Practice, 2009.Google Scholar
- R. Greiner and W. Zhou. Structural extension to logistic regression. Proceedings of the Eighteenth Annual National Conference on Artificial Intelligence (AAI02), 2002. Google ScholarDigital Library
- Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB '04:Proceedings of the Thirtieth international conference on Very large data bases, pages 576--587. VLDB Endowment, 2004. Google ScholarDigital Library
- Hadoop. http://hadoop.apache.org/, 2009.Google Scholar
- S. Hashem. Optimal linear combinations of neural networks. NEURAL NETWORKS, 10(4):599--614, 1994. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedma. The Elements of Statistical Learning. Springer, New York, NY, 2002.Google Scholar
- D. W. Hosmer and S. Lemeshow. Applied logistic regression (Wiley Series in probability and statistics). Wiley-Interscience Publication, September 2000.Google Scholar
- S. in Lee, H. Lee, P. Abbeel, and A. Y. Ng. Efficient l1 regularized logistic regression. In In AAAI, 2006. Google ScholarDigital Library
- N. Indurkhya and S. M. Weiss. Solving regression problems with rule-based ensemble classifiers. In KDD, pages 287--292, 2001. Google ScholarDigital Library
- I.-H. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 64--71, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- S. M. Kay. Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google ScholarDigital Library
- A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems, pages 231--238. MIT Press, 1995.Google ScholarDigital Library
- C.-H. Lee, R. Greiner, and S. Wang. Using query-specific variance estimates to combine bayesian classifiers. In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 529--536, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- Y. Li, Z. Zheng, and H. Dai. KDD CUP-2005 report: Facing a great challenge. ACM SIGKDD Explorations Newsletter, 7(2):99, 2005. Google ScholarDigital Library
- A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In in Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, 2002.Google Scholar
- R. Peck, C. Olsen, and J. L. Devore. Introduction to Statistics and Data Analysis (with ThomsonNOW Printed Access Card). Duxbury Press, 2007. Google ScholarDigital Library
- M. P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Artificial Neural Networks for Speech and Vision, pages 126--142. Chapman & Hall, 1993.Google Scholar
- J. R. Quinlan. Bagging, boosting, and c4.5. In In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725--730. AAAI Press, 1996. Google ScholarDigital Library
- Z. Yin, R. Li, Q. Mei, and J. Han. Exploring social tagging graph for web object classification. In KDD'09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 957--966, New York, NY, USA,2009. ACM. Google ScholarDigital Library
Index Terms
- Learning to combine discriminative classifiers: confidence based
Recommendations
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...
An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers
Intelligent SystemsAbstractEnsembles of classifiers are a very popular type of method for performing classification, due to their usually high predictive accuracy. However, ensembles have two drawbacks. First, ensembles are usually considered a ‘black box’, non-...
Hierarchical distance learning by stacking nearest neighbor classifiers
A hierarchical decision fusion and distance learning method, called FSG, is proposed.FSG is employed to bridge the gap between Bayes and N-sample classification error.A measure is proposed to learn expertise of base-layer classifiers of the FSG.FSG ...
Comments