research-article

Tight query complexity lower bounds for PCA via finite sample deformed wigner law

Authors:
Max Simchowitz

University of California at Berkeley, USA

University of California at Berkeley, USA
View Profile

,
Ahmed El Alaoui

University of California at Berkeley, USA

University of California at Berkeley, USA
View Profile

,
Benjamin Recht

University of California at Berkeley, USA

University of California at Berkeley, USA
View Profile

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of ComputingJune 2018Pages 1249–1259https://doi.org/10.1145/3188745.3188796

Published:20 June 2018Publication History

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

Pages 1249–1259

ABSTRACT

We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝ^{d × d}, an algorithm Alg is allowed to make T exact queries of the form w⁽ⁱ⁾ = M v⁽ⁱ⁾ for i in {1,...,T}, where v⁽ⁱ⁾ is drawn from a distribution which depends arbitrarily on the past queries and measurements {v^(j),w⁽ⁱ⁾}_{1 ≤ j ≤ i−1}. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gap_r(M) = Ω(gap) (where gap_r(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝ^{d × r} with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑_i=1^r λ_i(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension.

Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + <pre>λ</pre> U U^⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and <pre>λ</pre> > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap^−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation <pre>λ</pre>, when <pre>λ</pre> is near the “phase transition” <pre>λ</pre> = 1. To verify that d need only be polynomial in gap⁻¹ and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.

Supplemental Material

8b-4.mp4

mp4

42.2 MB

Download

References

Alekh Agarwal and Leon Bottou. 2014. A lower bound for the optimization of finite sums. arXiv preprint arXiv:1410.0723 (2014).Google Scholar
Alekh Agarwal, Martin J Wainwright, Peter L Bartlett, and Pradeep K Ravikumar. 2009. Information-theoretic lower bounds on the oracle complexity of convex optimization. In Advances in Neural Information Processing Systems. 1–9. Google ScholarDigital Library
Zeyuan Allen-Zhu and Yuanzhi Li. 2016.Google Scholar
First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate. arXiv preprint arXiv:1607.07837 (2016).Google Scholar
Zeyuan Allen-Zhu and Yuanzhi Li. 2016. LazySVD: Even faster SVD decomposition yet without agonizing pain. In Advances in Neural Information Processing Systems. 974–982. Google ScholarDigital Library
Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. 2010.Google Scholar
An introduction to random matrices. Vol. 118. Cambridge university press.Google Scholar
Anurag Anshu, Naresh B Goud, Rahul Jain, Srijita Kundu, and Priyanka Mukhopadhyay. 2017.Google Scholar
Lifting randomized query complexity to randomized communication complexity. arXiv preprint arXiv:1703.07521 (2017).Google Scholar
Ery Arias-Castro, Emmanuel J Candes, and Mark A Davenport. 2013.Google Scholar
On the fundamental limits of adaptive sensing. IEEE Transactions on Information Theory 59, 1 (2013), 472–481. Tight Query Complexity Lower Bounds for PCA STOC’18, June 25–29, 2018, Los Angeles, CA, USA Google ScholarDigital Library
Afonso S Bandeira, Ramon van Handel, et al. 2016. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. The Annals of Probability 44, 4 (2016), 2479–2506.Google ScholarCross Ref
Florent Benaych-Georges and Raj Rao Nadakuditi. 2011. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227, 1 (2011), 494–521.Google ScholarCross Ref
Mireille Capitaine, Catherine Donati-Martin, and Delphine Féral. 2009.Google Scholar
The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. Annals of Probability (2009), 1–47.Google Scholar
Rui M Castro et al. 2014. Adaptive sensing performance lower bounds for sparse signal detection and support estimation. Bernoulli 20, 4 (2014), 2217–2246.Google ScholarCross Ref
Rui M Castro and Ervin Tánczos. 2017. Adaptive Compressed Sensing for Support Recovery of Structured Sparse Sets. IEEE Transactions on Information Theory 63, 3 (2017), 1535–1554. Google ScholarDigital Library
Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. 2016.Google Scholar
On Bayes Risk Lower Bounds. Journal of Machine Learning Research 17 (2016), 1–58. Google ScholarDigital Library
Imre Csiszár. 1972. A class of measures of informativity of observation channels. Periodica Mathematica Hungarica 2, 1-4 (1972), 191–213.Google ScholarCross Ref
James W Demmel. 1997.Google Scholar
Applied numerical linear algebra. SIAM.Google Scholar
Delphine Féral and Sandrine Péché. 2007. The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in mathematical physics 272, 1 (2007), 185–228.Google Scholar
Dan Garber, Elad Hazan, Chi Jin, Sham M Kakade, Cameron Musco, Praneeth Netrapalli, and Aaron Sidford. 2016. Faster eigenvector computation via shift- and-invert preconditioning. In Proceedings of the 33nd International Conference on Machine Learning, ICML. 2626–2634. Google ScholarDigital Library
Roger A Horn and Charles R Johnson. 2012.Google Scholar
Matrix analysis. Cambridge university press.Google Scholar
Kevin G Jamieson, Robert Nowak, and Ben Recht. 2012.Google Scholar
Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems. 2672–2680. Google ScholarDigital Library
Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I Jordan. 2017. How to Escape Saddle Points Efficiently. arXiv preprint arXiv:1703.00887 (2017).Google Scholar
Ian Jolliffe. 2002.Google Scholar
Principal component analysis. Wiley Online Library.Google Scholar
Olav Kallenberg. 2006.Google Scholar
Foundations of modern probability. Springer Science & Business Media.Google Scholar
Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In Advances in Neural Information Processing Systems. 1396–1404. Google ScholarDigital Library
Jelani Nelson, Jakub Pachocki, and Zhengyu Wang. 2017.Google Scholar
Optimal lower bounds for universal relation, samplers, and finding duplicates. arXiv preprint arXiv:1703.08139 (2017).Google Scholar
Arkadii Nemirovskii, David Borisovich Yudin, and Edgar Ronald Dawson. 1983.Google Scholar
Problem complexity and method efficiency in optimization. (1983).Google Scholar
Andrew Y Ng, Michael I Jordan, et al. {n. d.}. On spectral clustering: Analysis and an algorithm.Google Scholar
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.Google Scholar
The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google Scholar
Eric Price and David P Woodruff. 2013. Lower bounds for adaptive sparse recovery. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 652–663. Google ScholarDigital Library
Ohad Shamir. 2014. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems. 163–171. Google ScholarDigital Library
Ohad Shamir. 2015. Fast stochastic algorithms for SVD and PCA: Convergence properties and convexity. arXiv preprint arXiv:1507.08788 (2015).Google Scholar
Max Simchowitz, Ahmed El Alaoui, and Benjamin Recht. 2018.Google Scholar
Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law. arXiv preprint (to appear) (2018).Google Scholar
Daniel A Spielman. 2007. Spectral graph theory and its applications. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on. IEEE, 29–38. Google ScholarDigital Library
Jacob Steinhardt and John C Duchi. 2015. Minimax rates for memory-bounded sparse linear regression.. In COLT. 1564–1587.Google Scholar
Jacob Steinhardt, Gregory Valiant, and Stefan Wager. 2015. Memory, Communication, and Statistical Queries.. In Electronic Colloquium on Computational Complexity (ECCC), Vol. 22. 1–2.Google Scholar
Roman Vershynin. {n. d.}. High Dimensional Probability. ({n. d.}).Google Scholar
Blake E Woodworth and Nati Srebro. 2016. Tight complexity bounds for optimizing composite objectives. In Advances in neural information processing systems. 3639–3647. Google ScholarDigital Library

Index Terms

Tight query complexity lower bounds for PCA via finite sample deformed wigner law
1. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Nonconvex optimization

Recommendations

Sampling lower bounds via information theory
STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing

We present a novel technique, based on the Jensen-Shannon divergence from information theory, to prove lower bounds on the query complexity of sampling algorithms that approximate functions over arbitrary domain and range. Unlike previous methods, our ...
Read More
Group-theoretic lower bounds for the complexity of matrix multiplication
TAMC'11: Proceedings of the 8th annual conference on Theory and applications of models of computation

The complexity of multiplication in group algebras is closely related to the complexity of matrix multiplication. Inspired by the recent group-theoretic approach by Cohn and Umans [10] and the algorithms by Cohn et al. [9] for matrix multiplication, we ...
Read More
Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication
BeyondMR '16: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond

We study one- and two-pass mapReduce algorithms for multiplying two matrices. First, consider one-pass algorithms. In the literature, there is a tight bound for the tradeoff between communication cost and parallelism. It measures communication cost ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
June 2018
1332 pages
ISBN:9781450355599
DOI:10.1145/3188745
General Chairs:
Ilias Diakonikolas
University of Southern California, USA
,
David Kempe
University of Southern California, USA
,
Program Chair:
Monika Henzinger
University of Vienna, Austria
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Lower Bounds
Optimization
PCA
Query Complexity
Random Matrix Theory
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 199
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Tight query complexity lower bounds for PCA via finite sample deformed wigner law

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Sampling lower bounds via information theory

Group-theoretic lower bounds for the complexity of matrix multiplication

Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Tight query complexity lower bounds for PCA via finite sample deformed wigner law

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Sampling lower bounds via information theory

Group-theoretic lower bounds for the complexity of matrix multiplication

Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media