Theses Doctoral

Discovery Through Bottlenecks in Multimodal Models

Chiquier, Mia Anna

Modern machine learning systems generate photorealistic images, classify data with superhuman accuracy, and synthesize human-like text, yet understanding and controlling their behavior remains challenging. While interpretability techniques and conditioning mechanisms have made progress, we propose a fundamentally different approach: building systems where information flows through inherently interpretable bottlenecks, unifying discovery and control by construction.

This thesis presents three bottleneck methods for multimodal systems that exploit a key duality: the same mechanisms that reveal interpretable patterns also serve as precise control interfaces. In fine-grained classification, LLM-based evolutionary optimization discovers discriminative features in interpretable language bottlenecks, enabling targeted control over classification decisions.

In motion domains, bidirectional models learn interpretable muscle activation patterns that both explain motion generation and enable conditional editing based on muscle activity goals. In visual domains, counterfactual generation discovers fine-grained discriminative features through direct image editing, revealing subtle differences between visually similar groups while enabling modifications beyond the reach of language prompts. Together, these methods demonstrate how representational constraints can transform opaque machine learning systems into interpretable, controllable frameworks applicable to domains where the features and control objectives worth discovering are not known in advance.

Files

  • thumbnail for Chiquier_columbia_0054D_19602.pdf Chiquier_columbia_0054D_19602.pdf application/pdf 3.99 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Vondrick, Carl M.
Degree
D.E.S., Columbia University
Published Here
November 19, 2025