Academic Commons Search Results
https://academiccommons.columbia.edu/catalog?action=index&controller=catalog&f%5Bauthor_facet%5D%5B%5D=Sun%2C+Ju&f%5Bdepartment_facet%5D%5B%5D=Electrical+Engineering&format=rss&fq%5B%5D=has_model_ssim%3A%22info%3Afedora%2Fldpd%3AContentAggregator%22&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usWhen Are Nonconvex Optimization Problems Not Scary?
https://academiccommons.columbia.edu/catalog/ac:199718
Sun, Juhttp://dx.doi.org/10.7916/D8251J7HFri, 27 May 2016 11:38:27 +0000Nonconvex optimization is NP-hard, even the goal is to compute a local minimizer. In applied disciplines, however, nonconvex problems abound, and simple algorithms, such as gradient descent and alternating direction, are often surprisingly effective. The ability of simple algorithms to find high-quality solutions for practical nonconvex problems remains largely mysterious.
This thesis focuses on a class of nonconvex optimization problems which CAN be solved to global optimality with polynomial-time algorithms. This class covers natural nonconvex formulations of central problems in signal processing, machine learning, and statistical estimation, such as sparse dictionary learning (DL), generalized phase retrieval (GPR), and orthogonal tensor decomposition. For each of the listed problems, the nonconvex formulation and optimization lead to novel and often improved computational guarantees.
This class of nonconvex problems has two distinctive features: (i) All local minimizer are also global. Thus obtaining any local minimizer solves the optimization problem; (ii) Around each saddle point or local maximizer, the function has a negative directional curvature. In other words, around these points, the Hessian matrices have negative eigenvalues. We call smooth functions with these two properties (qualitative) X functions, and derive concrete quantities and strategy to help verify the properties, particularly for functions with random inputs or parameters. As practical examples, we establish that certain natural nonconvex formulations for complete DL and GPR are X functions with concrete parameters.
Optimizing X functions amounts to finding any local minimizer. With generic initializations, typical iterative methods at best only guarantee to converge to a critical point that might be a saddle point or local maximizer. Interestingly, the X structure allows a number of iterative methods to escape from saddle points and local maximizers and efficiently find a local minimizer, without special initializations. We choose to describe and analyze the second-order trust-region method (TRM) that seems to yield the strongest computational guarantees. Intuitively, second-order methods can exploit Hessian to extract negative curvature directions around saddle points and local maximizers, and hence are able to successfully escape from the saddles and local maximizers of X functions. We state the TRM in a Riemannian optimization framework to cater to practical manifold-constrained problems. For DL and GPR, we show that under technical conditions, the TRM algorithm finds a global minimizer in a polynomial number of steps, from arbitrary initializations.Electrical engineering, Computer science, Mathematics, Nonconvex programming, Mathematical optimizationjs4038Electrical EngineeringDissertationsEfficient Point-to-Subspace Query in ℓ1 with Application to Robust Face Recognition
https://academiccommons.columbia.edu/catalog/ac:153463
Zhang, Yuqian; Wright, John N.; Sun, Juhttp://hdl.handle.net/10022/AC:P:14959Tue, 16 Oct 2012 13:03:07 +0000Motivated by vision tasks such as robust face and object recognition, we consider the following general problem: given a collection of low-dimensional linear subspaces in a high-dimensional ambient (image) space, and a query point (image), efficiently determine the nearest subspace to the query in ℓ1 distance. We show in theory this problem can be solved with a simple two-stage algorithm: (1) random Cauchy projection of query and subspaces into low-dimensional spaces followed by efficient distance evaluation (ℓ1 regression); (2) getting back to the high-dimensional space with very few candidates and performing exhaustive search. We present preliminary experiments on robust face recognition to corroborate our theory.Artificial intelligence, Computer scienceyz2409, jw2966, js4038Electrical EngineeringArticles