2020 Theses Doctoral
Cryptographic approaches to security and optimization in machine learning
Modern machine learning techniques have achieved surprisingly good standard test accuracy, yet classical machine learning theory has been unable to explain the underlying reason behind this success. The phenomenon of adversarial examples further complicates our understanding of what it means to have good generalization ability. Classifiers that generalize well to the test set are easily fooled by imperceptible image modifications, which can often be computed without knowledge of the classifier itself. The adversarial error of a classifier measures the error under which each test data point can be modified by an algorithm before it is given as input to the classifier. Followup work has showed that a tradeoff exists between optimizing for standard generalization error versus for adversarial error. This calls into question whether standard generalization error is the correct metric to measure.
We try to understand the generalization capability of modern machine learning techniques through the lens of adversarial examples. To reconcile the apparent tradeoff between the two competing notions of error, we create new security definitions and classifier constructions which allow us to prove an upper bound on the adversarial error that decreases as standard test error decreases. We introduce a cryptographic proof technique by defining a security assumption in a simpler attack setting and proving a security reduction from a restricted black-box attack problem to this security assumption. We then investigate the double descent curve in the interpolation regime, where test error can continue to decrease even after training error has reached zero, to give a natural explanation for the observed tradeoff between adversarial error and standard generalization error.
The second part of our work investigates further this notion of a black-box model by looking at the separation between being able to evaluate a function and being able to actually understand it. This is formalized through the notion of function obfuscation in cryptography. Given some concrete implementation of a function, the implementation is considered obfuscated if a user cannot produce the function output on a test input without querying the implementation itself. This means that a user cannot actually learn or understand the function even though all of the implementation details are presented in the clear. As expected this is a very strong requirement that does not exist for all functions one might be interested in. In our work we make progress on providing obfuscation schemes for simple, explicit function classes.
The last part of our work investigates non-statistical biases and algorithms for nonconvex optimization problems. We show that the continuous-time limit of stochastic gradient descent does not converge directly to the local optimum, but rather has a bias term which grows with the step size. We also construct novel, non-statistical algorithms for two parametric learning problems by employing lattice basis reduction techniques from cryptography.
- Shi_columbia_0054D_15728.pdf application/pdf 1.03 MB Download File
- mets.xml application/xml 12.7 KB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Hsu, Daniel
- Bishop, Allison
- Ph.D., Columbia University
- Published Here
- February 7, 2020