Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
Nils Blank,
Moritz Reuss,
Marcel Rühle,
Ömer Erdinç Yağmurlu,
Fabian Wenzel,
Oier Mees,
Rudolf Lioutikov
CoRL 2024
Paper Link
We introduce a novel approach to automatically label uncurated, long-horizon robot teleoperation data at scale in a zero-shot manner without any human intervention.
We utilize a combination of pre-trained vision-language foundation models to detect objects in a scene, propose possible tasks, segment tasks from large datasets of unlabelled interaction data and then train language-conditioned policies on the relabeled datasets.
Our initial experiments show that our method enables training language-conditioned policies on unlabeled and unstructured datasets that match ones trained with oracle human annotations.