2025 Theses Doctoral
Generative Computer Vision for Physical Intelligence
The fundamental purpose of human perception is to guide action. For this, my research is driven by the goal of building computer vision systems that can interact with the physical world. My approach toward this goal surrounds the use of generative models.
Training generative models on unlabeled data is a powerful, scalable, and comprehensive way to learn about the world, and the knowledge learned through this process can be leveraged for revolutionary applications such as general-purpose industrial robots, autonomous search and rescue agents, and home robots that do your housework.
However, deploying generative models in embodied AI systems for physical interaction poses several fundamental challenges. First, current generative models lack spatial understanding of the dynamic world, and second, they are not physically grounded, as evidenced by the hallucination problem. Third, interaction skills learned through human data are upper bound by human-level performance, which is often suboptimal.
This dissertation presents three approaches to address these challenges, each with examples and real-world deployable systems. Collectively, my dissertation advances the capabilities of generative models in physical intelligence by integrating techniques in computer vision, computer graphics, and robotics. It has had significant practical impacts, as evidenced by the widespread adoption of the open-source tools derived from this work in both research and industry settings. Looking forward, my research aims to further explore multimodal and multisensory perception, perception-action representation learning, and self-supervised robot learning.
Subjects
Files
-
Liu_columbia_0054D_19392.pdf
application/pdf
4.56 MB
Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Vondrick, Carl M.
- Degree
- Ph.D., Columbia University
- Published Here
- August 27, 2025