ABSTRACT
Modern computer hardware offers an elaborate hierarchy of storage subsystems with different speeds, capacities, and costs associated with them. Furthermore, processors are now inherently parallel offering the execution of several diverse threads simultaneously. This paper proposes StreamSVM, the first algorithm for training linear Support Vector Machines (SVMs) which takes advantage of these properties by integrating caching with optimization. StreamSVM works by performing updates in the dual, thus obviating the need to rebalance frequently visited examples. Furthermore we trade off file I/O with data expansion on the fly by generating features on demand. This significantly increases throughput. Experiments show that StreamSVM outperforms other linear SVM solvers, including the award winning work of [38], by orders of magnitude and produces more accurate solutions within a shorter amount of time.
Supplemental Material
- A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. A reliable effective terascale linear learning system. Technical report, 2011. http://arxiv.org/abs/1110.4198.Google Scholar
- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM, 2012. Google ScholarDigital Library
- G. Bakir, T. Hofmann, B. Schölkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan. Predicting Structured Data. 2007.Google Scholar
- A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classifiers with online and active learning. JMLR, 6:1579--1619, September 2005. Google ScholarDigital Library
- L. Bottou. Stochastic gradient SVMs. http://leon.bottou.org/projects/sgd, 2008.Google Scholar
- L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS 20, 2007.Google Scholar
- O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In NIPS 12, pages 196--202, 2001.Google Scholar
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarDigital Library
- E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Information Theory, 51(12):4203--4215, 2005. Google ScholarDigital Library
- C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- K. W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In KDD, pages 699--707, 2011. Google ScholarDigital Library
- L. Cheng, S. V. N. Vishwanathan, D. Schuurmans, S. Wang, and T. Caelli. Implicit online learning with kernels. In NIPS 19, 2006.Google Scholar
- C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
- K. Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. JMLR, 7:551--585, March 2006. Google ScholarDigital Library
- O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report, 2010. http://arxiv.org/abs/1012.1367.Google Scholar
- D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. PNAS, 106, 2009.Google Scholar
- R.-E. Fan, J.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9:1871--1874, August 2008. Google ScholarDigital Library
- C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, pages 408--415, 2008. Google ScholarDigital Library
- T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 169--184, 1999. Google ScholarDigital Library
- T. Joachims. Training linear SVMs in linear time. In KDD, 2006. Google ScholarDigital Library
- J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997. Google ScholarDigital Library
- J. Kivinen, M. K. Warmuth, and B. Hassibi. The p-norm generaliziation of the LMS algorithm for adaptive filtering. IEEE Trans. Signal Processing, 54(5):1782--1793, May 2006. Google ScholarDigital Library
- J. Langford, A. J. Smola, and M. Zinkevich. Slow learners are fast. http://arxiv.org/abs/0911.0491, 2009.Google Scholar
- Z. Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7--35, 1992. Google ScholarDigital Library
- G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPS 22, pages 1231--1239, 2009.Google Scholar
- N. Murata, S. Yoshizawa, and S. Amari. Network information criterion - determining the number of hidden units for artificial neural network models. IEEE Trans. Neural Networks, 5:865--872, 1994. Google ScholarDigital Library
- E. Osuna and F. Girosi. Reducing the run-time complexity in support vector regression. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 271--284, 1999. Google ScholarDigital Library
- J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185--208, 1999. Google ScholarDigital Library
- B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML, 2007. Google ScholarDigital Library
- Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, A. Strehl, and S. V. N. Vishwanathan. Hash kernels. In AISTATS, 2009.Google Scholar
- A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In VLDB, 2010. Google ScholarDigital Library
- S. Sonnenburg and V. Franc. COFFIN: A computational framework for linear SVMs. In ICML, 2010.Google ScholarDigital Library
- S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge. 2008. http://largescale.ml.tu-berlin.de/workshop/.Google Scholar
- C. H. Teo, S. V. N. Vishwanthan, A. J. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. JMLR, 11:311--365, January 2010. Google ScholarDigital Library
- M. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity. Technical report, Department of Statistics, UC Berkeley, 2006.Google Scholar
- K. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature hashing for large scale multitask learning. In ICML, 2009. Google ScholarDigital Library
- H. F. Yu, C. J. Hsieh, K. W. Chang, and C. J. Lin. Large linear classification when data cannot fit in memory. In KDD, pages 833--842, 2010. Google ScholarDigital Library
- L. Zanni, T. Serafini, and G. Zanghirati. Parallel software for training large scale support vector machines on multiprocessor systems. JMLR, 7:1467--1492, July 2006. Google ScholarDigital Library
- M. Zinkevich, A. Smola, M. Weimer, and L. Li. Parallelized stochastic gradient descent. In NIPS 23, pages 2595--2603, 2010.Google Scholar
Index Terms
- Linear support vector machines via dual cached loops
Recommendations
An overview on twin support vector machines
Twin support vector machines (TWSVM) is based on the idea of proximal SVM based on generalized eigenvalues (GEPSVM), which determines two nonparallel planes by solving two related SVM-type problems, so that its computing cost in the training phase is 1/...
Incremental training of support vector machines using hyperspheres
In the conventional incremental training of support vector machines, candidates for support vectors tend to be deleted if the separating hyperplane rotates as the training data are added. To solve this problem, in this paper, we propose an incremental ...
Twin Support Vector Machines for Pattern Classification
We propose Twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The Twin SVM formulation is in the spirit of proximal SVMs via generalized ...
Comments