Abstract
Can we learn a hierarchical visuomotor control policy
that can generalize to novel scenes, objects and geometries
without scaling teleoperated robot demonstrations ? Recent
works have shown impressive performance on manipulation
tasks through learning policies leveraging robot teleopera-
tion data collected at scale. To ensure true autonomy in
the real-world these policies should be able to generalize to
multiple tasks, visual domains, and diverse object geome-
tries in unstructured environments. A scalable solution must
reduce the dependence on collecting a large number of tele-
operated demonstrations while simultaneously ensuring the
alternative can be used to learn a representation that guides
low-level control effectively. We propose learning a policy
using human-play data - trajectories of humans freely inter-
acting with their environment. Human-play data provides
rich guidance about high-level actions to the low-level con-
troller. We demonstrate the effectiveness of our high-level
policy by testing with low-level control methods that use few
teleoperation demos. Further, we examine the feasibility
of a hierarchical policy that requires no teleoperation data
and can generalize to any robot embodiment while obeying
the kinematic constraints of the embodiment. We present
our results and ablation studies on tasks evaluated in the
real-world.