2016 Theses Doctoral
When Are Nonconvex Optimization Problems Not Scary?
Nonconvex optimization is NP-hard, even the goal is to compute a local minimizer. In applied disciplines, however, nonconvex problems abound, and simple algorithms, such as gradient descent and alternating direction, are often surprisingly effective. The ability of simple algorithms to find high-quality solutions for practical nonconvex problems remains largely mysterious.
This thesis focuses on a class of nonconvex optimization problems which CAN be solved to global optimality with polynomial-time algorithms. This class covers natural nonconvex formulations of central problems in signal processing, machine learning, and statistical estimation, such as sparse dictionary learning (DL), generalized phase retrieval (GPR), and orthogonal tensor decomposition. For each of the listed problems, the nonconvex formulation and optimization lead to novel and often improved computational guarantees.
This class of nonconvex problems has two distinctive features: (i) All local minimizer are also global. Thus obtaining any local minimizer solves the optimization problem; (ii) Around each saddle point or local maximizer, the function has a negative directional curvature. In other words, around these points, the Hessian matrices have negative eigenvalues. We call smooth functions with these two properties (qualitative) X functions, and derive concrete quantities and strategy to help verify the properties, particularly for functions with random inputs or parameters. As practical examples, we establish that certain natural nonconvex formulations for complete DL and GPR are X functions with concrete parameters.
Optimizing X functions amounts to finding any local minimizer. With generic initializations, typical iterative methods at best only guarantee to converge to a critical point that might be a saddle point or local maximizer. Interestingly, the X structure allows a number of iterative methods to escape from saddle points and local maximizers and efficiently find a local minimizer, without special initializations. We choose to describe and analyze the second-order trust-region method (TRM) that seems to yield the strongest computational guarantees. Intuitively, second-order methods can exploit Hessian to extract negative curvature directions around saddle points and local maximizers, and hence are able to successfully escape from the saddles and local maximizers of X functions. We state the TRM in a Riemannian optimization framework to cater to practical manifold-constrained problems. For DL and GPR, we show that under technical conditions, the TRM algorithm finds a global minimizer in a polynomial number of steps, from arbitrary initializations.
- Sun_columbia_0054D_13395.pdf binary/octet-stream 3.92 MB Download File
More About This Work
- Academic Units
- Electrical Engineering
- Thesis Advisors
- Wright, John N.
- Ph.D., Columbia University
- Published Here
- May 27, 2016