Skip to yearly menu bar Skip to main content


Poster

ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-sample V-Ensemble

Tianyuan Chen · Ronglong Cai · Faguo Wu · Xiao Zhang

Hall 3 + Hall 2B #387
[ ]
Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract: Offline reinforcement learning (RL) aims to learn from static datasets and thus faces the challenge of value estimation errors for out-of-distribution actions. The in-sample learning scheme addresses this issue by performing implicit TD backups that does not query the values of unseen actions. However, pre-existing in-sample value learning and policy extraction methods suffer from over-regularization, limiting their performance on suboptimal or compositional datasets. In this paper, we analyze key factors in in-sample learning that might potentially hinder the use of a milder constraint. We propose Actor-Critic with Temperature adjustment and In-sample Value Ensemble (ACTIVE), a novel in-sample offline RL algorithm that leverages an ensemble of V-functions for critic training and adaptively adjusts the constraint level using dual gradient descent. We theoretically show that the V-ensemble suppresses the accumulation of initial value errors, thereby mitigating overestimation. Our experiments on the D4RL benchmarks demonstrate that ACTIVE alleviates overfitting of value functions and outperforms existing in-sample methods in terms of learning stability and policy optimality.

Live content is unavailable. Log in and register to view live content