2024 Theses Doctoral
Spatial Reasoning in Dynamic Scenes
Over the past several years, machine learning has enabled incredible progress on many tasks, such as mastering board games, recognizing objects, conversing in natural language, and generating images or videos. Despite these accomplishments, state-of-the-art techniques in artificial intelligence lack the foundations necessary to flexibly and robustly understand and manipulate their three-dimensional spatial surroundings. For instance, before their second birthday, children learn that objects persist during occlusion, they know how containment works, and they are surprised by novel physics.
In contrast, a true notion of object permanence has remained elusive for computer vision, despite its vitality in perceiving and interacting with everyday situations.
In this thesis, I will outline my work on enhancing spatial reasoning within dynamic scenes, where I have integrated machine learning, intuitive physics, geometry, and world knowledge to create powerful frameworks that can capture, represent, and generate their complex, cluttered visual environment.
Specifically, I will present models to reconstruct 4D scenes, track objects through occlusions, and perform dynamic view synthesis, all from a single camera viewpoint, and often successfully generalizing to real-world settings. These capabilities are pivotal for applications in embodied intelligence (such as robotics and self-driving), content creation and editing, or augmented and mixed reality, where machines need to accurately represent their surroundings and deeply understand how they evolve over time.
Subjects
Files
- VanHoorick_columbia_0054D_18902.pdf application/pdf 3.9 MB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Vondrick, Carl M.
- Degree
- Ph.D., Columbia University
- Published Here
- November 13, 2024