Nils Blank

I am a PhD student in the Intuitive Robots Lab (IRL) at the Karlsruhe Institute of Technology (KIT), Germany. My research focuses on Imitation Learning and Foundation Models for Human-Robot-Interaction. I am supervised by Rudolf Lioutikov. I obtained my Master's Degree in Computer Science at the KIT. During my studies I interned at SAP SE and IONOS.

Email / Google Scholar / Github / LinkedIn

Research

My research focuses on Foundation Models and their applications in Robotics. In particular, I explore how we can employ Foundation models robustly and reliably in challenging robotic scenarios. Furthermore, my research focuses on goal driven explainability and how we can leverage foundation models for improved human-robot interaction.

SIR: Structured Image Representations for Explainable Robot Learning
Paul Mattes, Jan Schwab , Jens Oliver Bosch, Maximilian Xiling Li, Nils Blank, Minh-Trung Tang, Rudolf Lioutikov

CVPR 2025, Poster
Project Page / Code / arXiv

We introduce SIR, a novel approach for learning robot policies with explicit, interpretable structure. Instead of relying on opaque visual embeddings, our method constructs a fully connected scene graph from 2D or 3D image features and learns to sparsify it end-to-end, producing a minimal, task-relevant subgraph used for action generation. This design makes policies intrinsically explainable. Experiments on RoboCasa show that our sparse graph policies outperform image-based baselines (19.5% vs. 14.81% success rate) and are significantly more robust to visual distractors. Furthermore, analyzing the learned subgraphs enables introspection, revealing dataset biases such as spurious correlations and positional biases.

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss Rudolf Lioutikov

NeurIPS 2025, Poster
Project Page / Code / arXiv

We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments.

Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
Nils Blank, Moritz Reuss, Marcel Rühle, Ömer Erdinç Yağmurlu, Fabian Wenzel, Oier Mees,
Rudolf Lioutikov

CoRL 2024
Paper Link

We introduce a novel approach to automatically label uncurated, long-horizon robot teleoperation data at scale in a zero-shot manner without any human intervention. We utilize a combination of pre-trained vision-language foundation models to detect objects in a scene, propose possible tasks, segment tasks from large datasets of unlabelled interaction data and then train language-conditioned policies on the relabeled datasets. Our initial experiments show that our method enables training language-conditioned policies on unlabeled and unstructured datasets that match ones trained with oracle human annotations.

The website is based on the code from source code!