Embedding Morphology into Transformers for Cross-Robot Policy Learning
Kei Suzuki ⋅ Jing Liu ⋅ Ye Wang ⋅ Chiori Hori ⋅ Matthew Brand ⋅ Diego Romeres ⋅ Toshiaki Koike-Akino
Abstract
Transformer-based VLA policies have advanced rapidly as training data scales, yet cross-robot policy learning—training a single policy across multiple embodiments—remains challenging. Such policies are often embodiment-agnostic and must infer kinematics from observations, which can hurt robustness. We propose an embodiment-aware transformer that injects morphology via: (1) kinematic tokens with per-joint temporal chunking; (2) topology-aware attention bias to encourage message passing along kinematic edges; and (3) joint-attribute conditioning using per-joint descriptors. Across multiple embodiments, our method consistently outperforms the vanilla $\pi_{0.5}$ baseline.
Chat is not available.
Successful Page Load