Skip to yearly menu bar Skip to main content


Poster

Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations

Thomas Tian · Kratarth Goel

Hall 3 + Hall 2B #29
[ ] [ Project Page ]
Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Recent advancements in Large Language Models (LLMs) have revolutionized motion generation models in embodied applications such as autonomous driving and robotic manipulation. While LLM-type auto-regressive motion generation models benefit from training scalability, there remains a discrepancy between their tokenprediction imitation objectives and human preferences. As a result, these models often generate behaviors that deviate from what humans would prefer, making post-training preference alignment crucial for producing human-preferred motions. Unfortunately, post-training alignment requires extensive preference rankings over motions generated by the pre-trained model, which are costly and time-consuming to annotate—especially in multi-agent motion generation settings. Recently, there has been growing interest in leveraging expert demonstrations previously used during pre-training to scalably generate preference data for post-training alignment. However, these methods often adopt a worst-case assumption, treating all pre-trained model-generated samples as unpreferred and relying solely on pre-training expert demonstrations to construct preferred examples. This adversarial approach overlooks the valuable signal provided by preference rankings among the model’s own generations, ultimately reducing alignment effectiveness and potentially leading to misaligned behaviors. In this work, instead of treating all generated samples as equally bad, we propose a principled approach that leverages implicit preferences encoded in pre-training expert demonstrations to construct preference rankings among the pre-trained model’s generations, offering more nuanced guidance at a lower cost. We present the first investigation of direct preference alignment for multi-agent motion generation models using implicit preference feedback from pre-training demonstrations. We apply our approach to large-scale traffic simulation and demonstrate its effectiveness in improving the realism of generated behaviors involving up to 128 agents, making a 1M motion generation model comparable to state-of-the-art large models by relying solely on implicit feedback from pre-training demonstrations, without requiring additional post-training human preference annotations or incurring high computational costs.

Live content is unavailable. Log in and register to view live content