Academic Commons

Theses Doctoral

When Are Nonconvex Optimization Problems Not Scary?

Sun, Ju

Nonconvex optimization is NP-hard, even the goal is to compute a local minimizer. In applied disciplines, however, nonconvex problems abound, and simple algorithms, such as gradient descent and alternating direction, are often surprisingly effective. The ability of simple algorithms to find high-quality solutions for practical nonconvex problems remains largely mysterious.
This thesis focuses on a class of nonconvex optimization problems which CAN be solved to global optimality with polynomial-time algorithms. This class covers natural nonconvex formulations of central problems in signal processing, machine learning, and statistical estimation, such as sparse dictionary learning (DL), generalized phase retrieval (GPR), and orthogonal tensor decomposition. For each of the listed problems, the nonconvex formulation and optimization lead to novel and often improved computational guarantees.
This class of nonconvex problems has two distinctive features: (i) All local minimizer are also global. Thus obtaining any local minimizer solves the optimization problem; (ii) Around each saddle point or local maximizer, the function has a negative directional curvature. In other words, around these points, the Hessian matrices have negative eigenvalues. We call smooth functions with these two properties (qualitative) X functions, and derive concrete quantities and strategy to help verify the properties, particularly for functions with random inputs or parameters. As practical examples, we establish that certain natural nonconvex formulations for complete DL and GPR are X functions with concrete parameters.
Optimizing X functions amounts to finding any local minimizer. With generic initializations, typical iterative methods at best only guarantee to converge to a critical point that might be a saddle point or local maximizer. Interestingly, the X structure allows a number of iterative methods to escape from saddle points and local maximizers and efficiently find a local minimizer, without special initializations. We choose to describe and analyze the second-order trust-region method (TRM) that seems to yield the strongest computational guarantees. Intuitively, second-order methods can exploit Hessian to extract negative curvature directions around saddle points and local maximizers, and hence are able to successfully escape from the saddles and local maximizers of X functions. We state the TRM in a Riemannian optimization framework to cater to practical manifold-constrained problems. For DL and GPR, we show that under technical conditions, the TRM algorithm finds a global minimizer in a polynomial number of steps, from arbitrary initializations.

Files

  • thumnail for Sun_columbia_0054D_13395.pdf Sun_columbia_0054D_13395.pdf binary/octet-stream 3.92 MB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Wright, John N.
Degree
Ph.D., Columbia University
Published Here
May 27, 2016
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.