Agnostically Learning Halfspaces

Kalai, Adam; Klivans, Adam; Mansour, Yishay; Servedio, Rocco Anthony

We consider the problem of learning a halfspace in the agnostic framework of Kearns et al., where a learner is given access to a distribution on labelled examples but the labelling may be arbitrary. The learner's goal is to output a hypothesis which performs almost as well as the optimal halfspace with respect to future draws from this distribution. Although the agnostic learning framework does not explicitly deal with noise, it is closely related to learning in worst-case noise models such as malicious noise. We give the first polynomial-time algorithm for agnostically learning halfspaces with respect to several distributions, such as the uniform distribution over the $n$-dimensional Boolean cube {0,1}^n or unit sphere in n-dimensional Euclidean space, as well as any log-concave distribution in n-dimensional Euclidean space. Given any constant additive factor eps>0, our algorithm runs in poly(n) time and constructs a hypothesis whose error rate is within an additive eps of the optimal halfspace. We also show this algorithm agnostically learns Boolean disjunctions in time roughly 2^{\sqrt{n}} with respect to any distribution; this is the first subexponential-time algorithm for this problem. Finally, we obtain a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere which can tolerate the highest level of malicious noise of any algorithm to date. Our main tool is a polynomial regression algorithm which finds a polynomial that best fits a set of points with respect to a particular metric. We show that, in fact, this algorithm is an arbitrary-distribution generalization of the well known "low-degree" Fourier algorithm of Linial, Mansour, and Nisan and has excellent noise tolerance properties when minimizing with respect to the L_1 norm. We apply this algorithm in conjunction with a non-standard Fourier transform (which does not use the traditional parity basis) for learning halfspaces over the uniform distribution on the unit sphere; we believe this technique is of independent interest.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-030-05
Published Here
April 21, 2011