We select some unseen action trajectories from real-world data, and apply that to highly OOD paintings as initial frames to see the egocentric navigation or joint-control action acted out in the painting.
[Unseen Action Trajectory Illustrated in Real Video] → Predicted Video in [Painting 1] [Painting 2] ...
[Unseen Action Trajectory Illustrated in Real Video] → Predicted Video in [Painting 1] [Painting 2]
Here we select some unseen action trajectories from real-world data, and apply that to unseen images captured by us in our surroundings, to see the egocentric navigation acted out in unseen real-world scenes.
[Unseen Action Trajectory Illustrated in Real Video] → [Initial Image] [Predicted Video]
[Unseen Action Trajectory Illustrated in Real Video] → [Initial Image 1] [Predicted Video 1] ...
Here we show 25-DoF joint angle action trajectory controlled humanoid navigation and manipulation results on the val set of the 1x dataset and compare with the groundtruth videos for the same unseen action trajectories.
Here we show 3-DoF position trajectory controled navigation results on the test set of RECON. In each row we compare, the groundtruth video and predicted video by us and Navigation World Model (NWM) for the same unseen action trajectory.