Theses Doctoral

Spatial Reasoning in Dynamic Scenes

Van Hoorick, Basile

Over the past several years, machine learning has enabled incredible progress on many tasks, such as mastering board games, recognizing objects, conversing in natural language, and generating images or videos. Despite these accomplishments, state-of-the-art techniques in artificial intelligence lack the foundations necessary to flexibly and robustly understand and manipulate their three-dimensional spatial surroundings. For instance, before their second birthday, children learn that objects persist during occlusion, they know how containment works, and they are surprised by novel physics.
In contrast, a true notion of object permanence has remained elusive for computer vision, despite its vitality in perceiving and interacting with everyday situations.

In this thesis, I will outline my work on enhancing spatial reasoning within dynamic scenes, where I have integrated machine learning, intuitive physics, geometry, and world knowledge to create powerful frameworks that can capture, represent, and generate their complex, cluttered visual environment.

Specifically, I will present models to reconstruct 4D scenes, track objects through occlusions, and perform dynamic view synthesis, all from a single camera viewpoint, and often successfully generalizing to real-world settings. These capabilities are pivotal for applications in embodied intelligence (such as robotics and self-driving), content creation and editing, or augmented and mixed reality, where machines need to accurately represent their surroundings and deeply understand how they evolve over time.

Files

  • thumnail for VanHoorick_columbia_0054D_18902.pdf VanHoorick_columbia_0054D_18902.pdf application/pdf 3.9 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Vondrick, Carl M.
Degree
Ph.D., Columbia University
Published Here
November 13, 2024