Oral
in
Workshop: ICLR 2025 Workshop on Bidirectional Human-AI Alignment

Learning From Diverse Experts: Behavior Alignment Through Multi-Objective Inverse Reinforcement Learning

Project Page [ OpenReview]

Abstract

Imitation learning (IL) from demonstrations serves as one data-efficient and practical framework for achieving human-level performance and behavior alignment with human experts in sequential decision making. However, existing IL approaches mostly presume that the expert demonstrations are homogeneous and largely ignore the practical issue of multiple performance criteria and the resulting diverse preferences of the experts. To tackle this, we propose to learn simultaneously from multiple experts of different preferences through the lens of multi-objective inverse reinforcement learning (MOIRL). Specifically, MOIRL achieves unified learning from diverse experts by inferring the vector-valued reward function of each expert and reconcile these via \textit{reward consensus}. Built on this, we propose Multi-Objective Inverse Soft-Q Learning (MOIQ), which penalizes differences in the rewards for encouraging reward consensus. This approach enjoys transferability to unseen preferences due to the reward consensus among demonstrators. To further annotate the unknown preferences of demonstrations, we introduce a posterior network that can predict preferences of the given trajectories. Extensive experiments demonstrate that MOIQ is competitive in challenging scenarios with low and noisy annotations and can outperform stronger benchmark methods and approaches expert-level performance in the fully annotated regime.

Chat is not available.