Abstract
Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive data sets, because the size of the data set can be a bottleneck. Voting many classifiers built on small subsets of data ("pasting small votes") is a promising approach for learning from massive data sets, one that can utilize the power of boosting and bagging. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable.
- <i>Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</i>, San Francisco, CA, 2001. ACM.]]Google Scholar
- R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. A new ensemble diversity measure applied to thinning ensembles. In <i>Multiple Classifier Systems Workshop</i>, pages 306-316, Surrey, UK, 2003.]] Google ScholarDigital Library
- E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. <i>Machine Learning</i>, 36(1,2), 1999.]] Google ScholarDigital Library
- H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. <i>Nucleic Acids Research</i>, 28:235-242, 2000. http://www.pdb.org/.]]Google ScholarCross Ref
- C. L. Blake and C. J. Merz. UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.]]Google Scholar
- L. Breiman. Bagging predictors. <i>Machine Learning</i>, 24(2):123-140, 1996.]] Google ScholarDigital Library
- L. Breiman. Pasting bites together for prediction in large data sets. <i>Machine Learning</i>, 36(2): 85-103, 1999.]] Google ScholarDigital Library
- L. Breiman. Random forests. <i>Machine Learning</i>, 45(1):5-32, 2001.]] Google ScholarDigital Library
- P. Chan and S. Stolfo. Towards parallel and distributed learning by meta-learning. In <i>Working Notes AAAI Workshop on Knowledge Discovery and Databases</i>, pages 227-240, San Mateo, CA, 1993.]]Google Scholar
- N. V. Chawla, S. Eschrich, and L. O. Hall. Creating ensembles of classifiers. In <i>First IEEE International Conference on Data Mining</i>, pages 581-583, San Jose, CA, 2000.]] Google ScholarDigital Library
- N. V. Chawla, L. O. Hall, K. W. Bowyer, T. E. Moore, and W. P. Kegelmeyer. Distributed pasting of small votes. In <i>Third International Workshop on Multiple Classifier Systems</i>, pages 52-61, Cagliari, Italy, 2002a.]] Google ScholarDigital Library
- N. V. Chawla, T. E. Moore, L. O. Hall, K. W. Bowyer, C. Springer, and W. P. Kegelmeyer. Distributed learning with bagging-like performance. <i>Pattern Recognition Letters</i>, 24(1-3):455-471, 2002b.]]Google Scholar
- N. V. Chawla, T. E. Moore, Jr., L. O. Hall, K. W. Bowyer, W. P. Kegelmeyer, and C. Springer. Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction. In <i>ACM SIGKDD Workshop on Data Mining in Bio-Informatics</i>, San Francisco, CA, 2001.]]Google Scholar
- D. J. Spiegelhalter D. Michie and C. C. Taylor. <i>Machine Learning, Neural and Statistical Classification </i>. Ellis Horwood, 1994.]] Google ScholarDigital Library
- T. Dietterich. An empirical comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. <i>Machine Learning</i>, 40(2):139-157, 2000.]] Google ScholarDigital Library
- P. Domingos. Using partitioning to speed up specific-to-general rule induction. In <i>AAAI Workshop on Integrating Multiple Learned Models</i>, pages 29-34, Portland, OR, 1996.]]Google Scholar
- R. Duda, P. Hart, and D. Stork. <i>Pattern Classification</i>. Wiley-Interscience, 2001.]] Google ScholarDigital Library
- S. Eschrich, N. V. Chawla, and L. O. Hall. Learning to predict in complex biological domains. <i>Journal of System Simulation</i>, 2:1464-1471, 2002.]]Google Scholar
- S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. In <i>Advances in Neural Information Processing Systems 2</i>, Vancouver, Canada, 1990. Morgan Kaufmann.]] Google ScholarDigital Library
- U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. <i>Advances in Knowledge Discovery and Data Mining</i>, chapter From data mining to knowledge discovery: An overview. AAAI Press, Menlo Park, CA, 1996.]] Google ScholarDigital Library
- Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In <i>Thirteenth International Conference on Machine Learning</i>, Bari, Italy, 1996.]]Google Scholar
- G. Giacinto and F. Roli. An approach to automatic design of multiple classifier systems. <i>Pattern Recognition Letters</i>, 22:25-33, 2001.]] Google ScholarDigital Library
- I. J. Good. <i>The Estimation of Probabilities: An essay on modern Bayesian methods</i>. MIT Press, 1965.]]Google Scholar
- L. O. Hall, K. W. Bowyer, N. V. Chawla, T. E. Moore, and W. P. Kegelmeyer. Avatar: Adaptive Visualization Aid for Touring and Recovery. Technical Report SAND2000-8203, Sandia National Labs, 2000.]]Google ScholarCross Ref
- L. O. Hall, N. V. Chawla, K. W. Bowyer, and W. P. Kegelmeyer. Learning rules from distributed data. In <i>Workshop of Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</i>, San Diego, CA, 1999.]] Google ScholarDigital Library
- T. Ho. Random subspace method for constructing decision forests. <i>IEEE Transactions on PAMI</i>, 20(8):832-844, 1998.]] Google ScholarDigital Library
- D. T. Jones. Protein secondary structure prediction based on decision-specific scoring matrices. <i>Journal of Molecular Biology</i>, 292:195-202, 1999.]]Google Scholar
- L. Kuncheva and C. Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. <i>Machine Learning</i>, 51:181-207, 2003.]] Google ScholarDigital Library
- L. Kuncheva, C. Whitaker, C. Shipp, and R. Duin. Is independence good for combining classifiers? In <i>Proceedings of 15th International Conference on Pattern Recognition</i>, pages 168-171, Barcelona, Spain, September 2000.]]Google Scholar
- P. Latinne, O. Debeir, and C. Decaestecker. Limiting the number of trees in random forests. In J. Kittler and F. Roli, editors, <i>Multiple Classifier Systems, Second International Workshop</i>, pages 178-187, Cambridge, UK, 2001. Springer.]] Google ScholarDigital Library
- A. Lazarevic and Z. Obradovic. Boosting algorithms for parallel and distributed learning. <i>Distributed and Parallel Databases Journal, Special Issue on Parallel and Distributed Data Mining</i>, 11:203-229, 2002.]] Google ScholarDigital Library
- N. Leavitt. Data mining for the corporate masses. In <i>IEEE Computer</i>. IEEE Computer Society, May 2002.]] Google ScholarDigital Library
- Lawrence Livermore National Laboratories. ASCI Blue Pacific. http://www.llnl.gov/asci/platforms/bluepac.]]Google Scholar
- Lawrence Livermore National Laboratories. Protein Structure Prediction Center. http://predictioncenter.llnl.gov/, 1999.]]Google Scholar
- R. Musick, J. Catlett, and S. Russell. Decision theoretic subsampling for induction on large databases. In <i>Proceedings of Tenth International Conference on Machine Learning</i>, pages 212- 219, Amherst, MA, 1993.]]Google Scholar
- C. Perlich, F. Provost, and J. Simonoff. Tree induction vs. logistic regression: A learning-curve analysis. <i>Journal of Machine Learning Research</i>, 4:211-255, 2003.]] Google ScholarDigital Library
- F. Provost and D. N. Hennessy. Scaling up: Distributed machine learning with cooperation. In <i>Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI'96</i>, pages 74-79, Portland, Oregon, 1996.]] Google ScholarDigital Library
- F. Provost and V. Kolluri. A survey of methods for scaling up inductive algorithms. <i>Data Mining and Knowledge Discovery</i>, 3(2):131-169, 1999.]] Google ScholarDigital Library
- F. J. Provost, D. Jensen, and T. Oates. Efficient progressive sampling. In <i>Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</i>, pages 23- 32, San Diego, CA, 1999.]] Google ScholarDigital Library
- J. R. Quinlan. <i>C4.5: Programs for Machine Learning</i>. Morgan Kaufmann, San Mateo, CA, 1992.]] Google ScholarDigital Library
- D. B. Skalak. The sources of increased accuracy for two proposed boosting algorithms. In <i>AAAI Integrating Multiple Learned Models Workshop</i>, Portland, Oregon, 1996.]]Google Scholar
- W. N. Street and Y. Kim. A streaming ensemble algorithm (SEA) for large-scale classification. In <i>Proceedings of seventh International Conference on Knowledge Discovery and Data Mining</i>, pages 377-382, 2001. San Francisco, CA.]] Google ScholarDigital Library
Index Terms
- Learning Ensembles from Bites: A Scalable and Accurate Approach
Recommendations
Using boosting to prune bagging ensembles
Boosting is used to determine the order in which classifiers are aggregated in a bagging ensemble. Early stopping in the aggregation of the classifiers in the ordered bagging ensemble allows the identification of subensembles that require less memory ...
Class-switching neural network ensembles
This article investigates the properties of class-switching ensembles composed of neural networks and compares them to class-switching ensembles of decision trees and to standard ensemble learning methods, such as bagging and boosting. In a class-...
Comments