The videos below show the predictions of our keypoint model when conditioned on a history of corresponding keypoints from the robot. We find that the model predictions direct the robot hand towards reasonable grasp poses of the object. For the t-th frame, the model is given input the robot keypoints for the last 8 frames and the predicted keypoints are plotted as red, green, blue, yellow, and cyan points, one for each fingertip.
The videos below show the rollout for policies learnts for the Grasp and Lift task by multiple robots, as in the main paper. We find that while all robots successfully complete the task, the motion of the robot is more humanlike for hands whose form factor more closely resembles that of humans.
The videos below show the rollout for policies learnt for the tasks Lift and Throw and Grasp and Lift - Clutter with the Allegro Hand. With the right sparse task reward, our prediction model can guide the robot to do multiple, although similar, tasks.