research-article

Indexed block coordinate descent for large-scale linear classification with limited memory

Authors:
Ian En-Hsu Yen

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Chun-Fu Chang

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Ting-Wei Lin

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Shan-Wei Lin

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

,
Shou-De Lin

National Taiwan University, Taipei, Taiwan Roc

National Taiwan University, Taipei, Taiwan Roc
View Profile

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2013Pages 248–256https://doi.org/10.1145/2487575.2487626

Published:11 August 2013Publication History

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 248–256

ABSTRACT

Linear Classification has achieved complexity linear to the data size. However, in many applications, data contain large amount of samples that does not help improve the quality of model, but still cost much I/O and memory to process. In this paper, we show how a Block Coordinate Descent method based on Nearest-Neighbor Index can significantly reduce such cost when learning a dual-sparse model. In particular, we employ truncated loss function to induce a series of convex programs with superior dual sparsity, and solve each dual using Indexed Block Coordinate Descent, which makes use of Approximate Nearest Neighbor (ANN) search to select active dual variables without I/O cost on irrelevant samples. We prove that, despite the bias and weak guarantee from ANN query, the proposed algorithm has global convergence to the solution defined on entire dataset, with sublinear complexity each iteration. Experiments in both sufficient and limited memory conditions show that the proposed approach learns many times faster than other state-of-the-art solvers without sacrificing accuracy.

References

J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In CVPR, 1997. Google ScholarDigital Library
K.-W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In SIGKDD. ACM, 2011. Google ScholarDigital Library
O. Chapelle, C. B. Do, Q. V. Le, A. J. Smola, and C. H. Teo. Tighter bounds for structured estimation. In NIPS, 2008.Google Scholar
R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14, 2002. Google ScholarDigital Library
R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML, 2006. Google ScholarDigital Library
I. S. Dhillon, P. D. Ravikumar, and A. Tewari. Nearest neighbor based greedy coordinate descent. In NIPS, 2011.Google ScholarDigital Library
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. JMLR, 9, 2008. Google ScholarDigital Library
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, 2008. Google ScholarDigital Library
P. Jain, S. Vijayanarasimhan, and K. Grauman. Hashing hyperplane queries to near points with applications to large-scale active learning. In NIPS, 2010.Google Scholar
T. Joachims. Training linear SVMs in linear time. In SIGKDD, 2006. Google ScholarDigital Library
C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression, 2008.Google Scholar
T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.Google Scholar
Z.-Q. Luo and P. Tseng. On the convergence of coordinate descent method for convex differentiable minimization. Optim. Theory, 72, 1992. Google ScholarDigital Library
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: Efficient indexing for high-dimensional similarity search. In ICVLDB, 2007. Google ScholarDigital Library
P. Ram and A. G. Gray. Maximum inner-product search using cone trees. In KDD, 2012. Google ScholarDigital Library
S. Shalev-Shwartz, K. Crammer, O. Dekel, and Y. Singer. Online passive-aggressive algorithms. In NIPS, 2003.Google Scholar
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient SOlver for SVM. In ICML, 2007. Google ScholarDigital Library
B. K. Sriperumbudur and G. R. G. Lanckriet. On the convergence of the concave-convex procedure. In NIPS, 2009.Google Scholar
I. Steinwart. Sparseness of support vector machines. JMLR, 4, 2003. Google ScholarDigital Library
P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Math. Program, 117, 2009. Google ScholarDigital Library
L. Wang, H. Jia, and J. Li. Letters: Training robust support vector machine with smooth ramp loss in the primal space. Neurocomput., 71, 2008. Google ScholarDigital Library
Z. Wang and S. Vucetic. Fast online training of ramp loss support vector machines. In ICDM, 2009. Google ScholarDigital Library
I. E. Yen, N. Peng., P. Wang, and S. Lin. On convergence rate of concave-convex procedure. In NIPS, 2012.Google Scholar
H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. SIGKDD, 2010. Google ScholarDigital Library
A. L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15, 2002. Google ScholarDigital Library

Index Terms

Indexed block coordinate descent for large-scale linear classification with limited memory
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Data types and structures

Recommendations

BFGS with Update Skipping and Varying Memory

We give conditions under which limited-memory quasi-Newton methods with exact line searches will terminate in n steps when minimizing n-dimensional quadratic functions. We show that although all Broyden family methods terminate in n steps in their full-...
Read More
Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization

In this paper we analyze several new methods for solving nonconvex optimization problems with the objective function consisting of a sum of two terms: one is nonconvex and smooth, and another is convex but simple and its structure is known. Further, we ...
Read More
On the complexity analysis of randomized block-coordinate descent methods

In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in Nesterov (SIAM J Optim 22(2):341---362, 2012), Richtárik and Takáă (Math Program 144(1---2):1---38, 2014) for minimizing the sum of a smooth convex function and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Editors:
Rayid Ghani
University of Chicago
,
Ted E. Senator
SAIC
,
Paul Bradley
MethodCare, Inc.
,
Rajesh Parekh
Groupon
,
Jingrui He
Stevens Institute of Technology
,
General Chairs:
Robert L. Grossman
University of Chicago and Open Data Group
,
Ramasamy Uthurusamy
General Motors Corporation (retired)
,
Program Chairs:
Inderjit S. Dhillon
University of Texas
,
Yehuda Koren
Google
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cccp
classification
dual coordinate descent
indexing
large-scale
limited-memory
nearest-neighbor
ramp-loss
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '13 Paper Acceptance Rate125of726submissions,17%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 606
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Indexed block coordinate descent for large-scale linear classification with limited memory

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

BFGS with Update Skipping and Varying Memory

Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization

On the complexity analysis of randomized block-coordinate descent methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Indexed block coordinate descent for large-scale linear classification with limited memory

KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

BFGS with Update Skipping and Varying Memory

Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization

On the complexity analysis of randomized block-coordinate descent methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media