Theses Doctoral

Towards Generalist Robots through Visual World Modeling

Chen, Boyuan

Moving from narrow robots specializing in specific tasks to generalist robots excelling in multiple tasks in various environmental conditions is the future of next-generation robotics. The key to generalist robots is the ability to learn world models that are reusable, generalizable, and adaptable. Having a general understanding of how the physical world works will enable robots to acquire transferable knowledge across different tasks, predict possible outcomes of future actions before execution, and constantly update their knowledge through continual interactions. While the majority of robot learning frameworks tend to mix task-related and task-agnostic components altogether throughout the learning process, these two components are often not intertwined when one of them is changed. For example, a task-agnostic component such as the computational model of the robot body remains the same even under different task settings, while a task-related component such as the dynamics of a moving object remains the same for different embodiments.

This thesis studies the key steps towards building generalist robots by decomposing the world modeling problem into task-agnostic and task-related elements: (1) robot self-modeling; (2) robot modeling other agents; and (3) robot modeling the physical environment. This framework has produced powerful and efficient learning-based robotic systems for a variety of tasks and physical embodiments, such as computational models of physical robots that can be reused and adapted to numerous task objectives and changing environments, behavior modeling frameworks for complex multi-robot applications, and dynamical system understanding algorithms to distill compact physics knowledge from high-dimensional and multi-modal sensory data. The approach in this thesis could help catalyze the understanding, prediction, and control of increasingly complex systems.


  • thumnail for Chen_columbia_0054D_17134.pdf Chen_columbia_0054D_17134.pdf application/pdf 9.83 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Lipson, Hod
Ph.D., Columbia University
Published Here
April 20, 2022