Theses Doctoral

Representational Capabilities of Feed-forward and Sequential Neural Architectures

Sanford, Clayton Hendrick

Despite the widespread empirical success of deep neural networks over the past decade, a comprehensive understanding of their mathematical properties remains elusive, which limits the abilities of practitioners to train neural networks in a principled manner. This dissertation provides a representational characterization of a variety of neural network architectures, including fully-connected feed-forward networks and sequential models like transformers.

The representational capabilities of neural networks are most famously characterized by the universal approximation theorem, which states that sufficiently large neural networks can closely approximate any well-behaved target function. However, the universal approximation theorem applies exclusively to two-layer neural networks of unbounded size and fails to capture the comparative strengths and weaknesses of different architectures.

The thesis addresses these limitations by quantifying the representational consequences of random features, weight regularization, and model depth on feed-forward architectures. It further investigates and contrasts the expressive powers of transformers and other sequential neural architectures. Taken together, these results apply a wide range of theoretical tools—including approximation theory, discrete dynamical systems, and communication complexity—to prove rigorous separations between different neural architectures and scaling regimes.

Files

  • thumnail for Sanford_columbia_0054D_18495.pdf Sanford_columbia_0054D_18495.pdf application/pdf 3.47 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Hsu, Daniel Joseph
Servedio, Rocco A.
Degree
Ph.D., Columbia University
Published Here
May 1, 2024