skip to main content
10.1145/1835804.1835899acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Learning to combine discriminative classifiers: confidence based

Published:25 July 2010Publication History

ABSTRACT

Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust classification accuracy. Combining multiple classifiers (a.k.a. Ensemble Learning) is a well studied and has been known to improve effectiveness of a classifier. To address two key challenges in Ensemble Learning-- (1) learning weights of individual classifiers and (2) the combination rule of their weighted responses, this paper proposes a novel Ensemble classifier, EnLR, that computes weights of responses from discriminative classifiers and combines their weighted responses to produce a single response for a test instance. The combination rule is based on aggregating weighted responses, where a weight of an individual classifier is inversely based on their respective variances around their responses. Here, variance quantifies the uncertainty of the discriminative classifiers' parameters, which in turn depends on the training samples. As opposed to other ensemble methods where the weight of each individual classifier is learned as a part of parameter learning and thus the same weight is applied to all testing instances, our model is actively adjusted as individual classifiers become confident at its decision for a test instance. Our empirical experiments on various data sets demonstrate that our combined classifier produces "effective" results when compared with a single classifier. Our novel classifier shows statistically significant better accuracy when compared to well known Ensemble methods -- Bagging and AdaBoost. In addition to robust accuracy, our model is extremely efficient dealing with high volumes of training samples due to the independent learning paradigm among its multiple classifiers. It is simple to implement in a distributed computing environment such as Hadoop.

Skip Supplemental Material Section

Supplemental Material

kdd2010_lee_lcd_01.mov

mov

132.4 MB

References

  1. T. Amemiya. Introduction to Statistics and Econometrics. Harvard University Press, 1994.Google ScholarGoogle Scholar
  2. A. Asuncion and D. Newman. UCI machine learning repository, 2007.Google ScholarGoogle Scholar
  3. L. Breiman and L. Breiman. Bagging predictors. In Machine Learning, pages 123--140, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Cao, D. H. Hu, D. Shen, D. Jiang, J.-T. Sun,E. Chen, and Q. Yang. Context-aware query classification. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 3--10, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Frank, M. Ester, and A. Knobbe. A multi-relational approach to spatial classification. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 309--318, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm, 1996.Google ScholarGoogle Scholar
  7. J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4):367--378, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Fuxman, A. Kannan, A. B. Goldberg, R. Agrawal, P. Tsaparas, and J. Shafer. Improving classification accuracy using automatically extracted training data. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1145--1154, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ghosal. Dirichlet process, related priors and posterior asymptotics. Bayesian Nonparametrics in Practice, 2009.Google ScholarGoogle Scholar
  11. R. Greiner and W. Zhou. Structural extension to logistic regression. Proceedings of the Eighteenth Annual National Conference on Artificial Intelligence (AAI02), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB '04:Proceedings of the Thirtieth international conference on Very large data bases, pages 576--587. VLDB Endowment, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hadoop. http://hadoop.apache.org/, 2009.Google ScholarGoogle Scholar
  14. S. Hashem. Optimal linear combinations of neural networks. NEURAL NETWORKS, 10(4):599--614, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Hastie, R. Tibshirani, and J. Friedma. The Elements of Statistical Learning. Springer, New York, NY, 2002.Google ScholarGoogle Scholar
  16. D. W. Hosmer and S. Lemeshow. Applied logistic regression (Wiley Series in probability and statistics). Wiley-Interscience Publication, September 2000.Google ScholarGoogle Scholar
  17. S. in Lee, H. Lee, P. Abbeel, and A. Y. Ng. Efficient l1 regularized logistic regression. In In AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Indurkhya and S. M. Weiss. Solving regression problems with rule-based ensemble classifiers. In KDD, pages 287--292, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I.-H. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 64--71, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. M. Kay. Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems, pages 231--238. MIT Press, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C.-H. Lee, R. Greiner, and S. Wang. Using query-specific variance estimates to combine bayesian classifiers. In ICML '06: Proceedings of the 23rd international conference on Machine learning, pages 529--536, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Li, Z. Zheng, and H. Dai. KDD CUP-2005 report: Facing a great challenge. ACM SIGKDD Explorations Newsletter, 7(2):99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In in Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, 2002.Google ScholarGoogle Scholar
  25. R. Peck, C. Olsen, and J. L. Devore. Introduction to Statistics and Data Analysis (with ThomsonNOW Printed Access Card). Duxbury Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Artificial Neural Networks for Speech and Vision, pages 126--142. Chapman & Hall, 1993.Google ScholarGoogle Scholar
  27. J. R. Quinlan. Bagging, boosting, and c4.5. In In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725--730. AAAI Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Z. Yin, R. Li, Q. Mei, and J. Han. Exploring social tagging graph for web object classification. In KDD'09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 957--966, New York, NY, USA,2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to combine discriminative classifiers: confidence based

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
            July 2010
            1240 pages
            ISBN:9781450300551
            DOI:10.1145/1835804

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 July 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader