Agnostically Learning Halfspaces

Title:

Agnostically Learning Halfspaces

Author(s):

Kalai, Adam
Klivans, Adam
Mansour, Yishay
Servedio, Rocco Anthony

Date:

2005

Type:

Technical reports

Department:

Computer Science

Permanent URL:

http://hdl.handle.net/10022/AC:P:29412

Series:

Columbia University Computer Science Technical Reports

Part Number:

CUCS03005

Abstract:

We consider the problem of learning a halfspace in the agnostic framework of Kearns et al., where a learner is given access to a distribution on labelled examples but the labelling may be arbitrary. The learner's goal is to output a hypothesis which performs almost as well as the optimal halfspace with respect to future draws from this distribution. Although the agnostic learning framework does not explicitly deal with noise, it is closely related to learning in worstcase noise models such as malicious noise. We give the first polynomialtime algorithm for agnostically learning halfspaces with respect to several distributions, such as the uniform distribution over the $n$dimensional Boolean cube {0,1}^n or unit sphere in ndimensional Euclidean space, as well as any logconcave distribution in ndimensional Euclidean space. Given any constant additive factor eps>0, our algorithm runs in poly(n) time and constructs a hypothesis whose error rate is within an additive eps of the optimal halfspace. We also show this algorithm agnostically learns Boolean disjunctions in time roughly 2^{\sqrt{n}} with respect to any distribution; this is the first subexponentialtime algorithm for this problem. Finally, we obtain a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere which can tolerate the highest level of malicious noise of any algorithm to date. Our main tool is a polynomial regression algorithm which finds a polynomial that best fits a set of points with respect to a particular metric. We show that, in fact, this algorithm is an arbitrarydistribution generalization of the well known "lowdegree" Fourier algorithm of Linial, Mansour, and Nisan and has excellent noise tolerance properties when minimizing with respect to the L_1 norm. We apply this algorithm in conjunction with a nonstandard Fourier transform (which does not use the traditional parity basis) for learning halfspaces over the uniform distribution on the unit sphere; we believe this technique is of independent interest.

Subject(s):

Computer science
 Item views:

201
 Metadata:

text  xml