research-article

Linear support vector machines via dual cached loops

Authors:
Shin Matsushima

The University of Tokyo, Tokyo, Japan

The University of Tokyo, Tokyo, Japan
View Profile

,
S.V.N. Vishwanathan

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Alexander J. Smola

Yahoo! Reseaarch, Santa Clara, CA, USA

Yahoo! Reseaarch, Santa Clara, CA, USA
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 177–185https://doi.org/10.1145/2339530.2339559

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 177–185

ABSTRACT

Modern computer hardware offers an elaborate hierarchy of storage subsystems with different speeds, capacities, and costs associated with them. Furthermore, processors are now inherently parallel offering the execution of several diverse threads simultaneously. This paper proposes StreamSVM, the first algorithm for training linear Support Vector Machines (SVMs) which takes advantage of these properties by integrating caching with optimization. StreamSVM works by performing updates in the dual, thus obviating the need to rebalance frequently visited examples. Furthermore we trade off file I/O with data expansion on the fly by generating features on demand. This significantly increases throughput. Experiments show that StreamSVM outperforms other linear SVM solvers, including the award winning work of [38], by orders of magnitude and produces more accurate solutions within a shorter amount of time.

Supplemental Material

311b_m_talk_5.mp4

mp4

121.1 MB

Download

References

A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. A reliable effective terascale linear learning system. Technical report, 2011. http://arxiv.org/abs/1110.4198.Google Scholar
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM, 2012. Google ScholarDigital Library
G. Bakir, T. Hofmann, B. Schölkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan. Predicting Structured Data. 2007.Google Scholar
A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classifiers with online and active learning. JMLR, 6:1579--1619, September 2005. Google ScholarDigital Library
L. Bottou. Stochastic gradient SVMs. http://leon.bottou.org/projects/sgd, 2008.Google Scholar
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS 20, 2007.Google Scholar
O. Bousquet and A. Elisseeff. Algorithmic stability and generalization performance. In NIPS 12, pages 196--202, 2001.Google Scholar
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google ScholarDigital Library
E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Information Theory, 51(12):4203--4215, 2005. Google ScholarDigital Library
C. C. Chang and C. J. Lin. LIBSVM: a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
K. W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In KDD, pages 699--707, 2011. Google ScholarDigital Library
L. Cheng, S. V. N. Vishwanathan, D. Schuurmans, S. Wang, and T. Caelli. Implicit online learning with kernels. In NIPS 19, 2006.Google Scholar
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
K. Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. JMLR, 7:551--585, March 2006. Google ScholarDigital Library
O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report, 2010. http://arxiv.org/abs/1012.1367.Google Scholar
D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. PNAS, 106, 2009.Google Scholar
R.-E. Fan, J.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9:1871--1874, August 2008. Google ScholarDigital Library
C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, pages 408--415, 2008. Google ScholarDigital Library
T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 169--184, 1999. Google ScholarDigital Library
T. Joachims. Training linear SVMs in linear time. In KDD, 2006. Google ScholarDigital Library
J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1--64, January 1997. Google ScholarDigital Library
J. Kivinen, M. K. Warmuth, and B. Hassibi. The p-norm generaliziation of the LMS algorithm for adaptive filtering. IEEE Trans. Signal Processing, 54(5):1782--1793, May 2006. Google ScholarDigital Library
J. Langford, A. J. Smola, and M. Zinkevich. Slow learners are fast. http://arxiv.org/abs/0911.0491, 2009.Google Scholar
Z. Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7--35, 1992. Google ScholarDigital Library
G. Mann, R. McDonald, M. Mohri, N. Silberman, and D. Walker. Efficient large-scale distributed training of conditional maximum entropy models. In NIPS 22, pages 1231--1239, 2009.Google Scholar
N. Murata, S. Yoshizawa, and S. Amari. Network information criterion - determining the number of hidden units for artificial neural network models. IEEE Trans. Neural Networks, 5:865--872, 1994. Google ScholarDigital Library
E. Osuna and F. Girosi. Reducing the run-time complexity in support vector regression. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 271--284, 1999. Google ScholarDigital Library
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185--208, 1999. Google ScholarDigital Library
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML, 2007. Google ScholarDigital Library
Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, A. Strehl, and S. V. N. Vishwanathan. Hash kernels. In AISTATS, 2009.Google Scholar
A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In VLDB, 2010. Google ScholarDigital Library
S. Sonnenburg and V. Franc. COFFIN: A computational framework for linear SVMs. In ICML, 2010.Google ScholarDigital Library
S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge. 2008. http://largescale.ml.tu-berlin.de/workshop/.Google Scholar
C. H. Teo, S. V. N. Vishwanthan, A. J. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. JMLR, 11:311--365, January 2010. Google ScholarDigital Library
M. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity. Technical report, Department of Statistics, UC Berkeley, 2006.Google Scholar
K. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature hashing for large scale multitask learning. In ICML, 2009. Google ScholarDigital Library
H. F. Yu, C. J. Hsieh, K. W. Chang, and C. J. Lin. Large linear classification when data cannot fit in memory. In KDD, pages 833--842, 2010. Google ScholarDigital Library
L. Zanni, T. Serafini, and G. Zanghirati. Parallel software for training large scale support vector machines on multiprocessor systems. JMLR, 7:1467--1492, July 2006. Google ScholarDigital Library
M. Zinkevich, A. Smola, M. Weimer, and L. Li. Parallelized stochastic gradient descent. In NIPS 23, pages 2595--2603, 2010.Google Scholar

Index Terms

Linear support vector machines via dual cached loops
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

An overview on twin support vector machines

Twin support vector machines (TWSVM) is based on the idea of proximal SVM based on generalized eigenvalues (GEPSVM), which determines two nonparallel planes by solving two related SVM-type problems, so that its computing cost in the training phase is 1/...
Read More
Incremental training of support vector machines using hyperspheres

In the conventional incremental training of support vector machines, candidates for support vectors tend to be deleted if the separating hyperplane rotates as the training data are added. To solve this problem, in this paper, we propose an incremental ...
Read More
Twin Support Vector Machines for Pattern Classification

We propose Twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The Twin SVM formulation is in the spirit of proximal SVMs via generalized ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coordinate descent
optimization
support vector machines
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 560
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Linear support vector machines via dual cached loops

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An overview on twin support vector machines

Incremental training of support vector machines using hyperspheres

Twin Support Vector Machines for Pattern Classification