Skip to yearly menu bar Skip to main content


Poster

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Qingmao Yao · Zhichao Lei · Tianyuan Chen · Ziyue Yuan · Xuefan Chen · Jianxiang Liu · Faguo Wu · Xiao Zhang

Hall 3 + Hall 2B #384
[ ]
Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: Offline Reinforcement Learning (RL) struggles with distributional shifts, leading to the $Q$-value overestimation for out-of-distribution (OOD) actions. Existing methods address this issue by imposing constraints; however, they often become overly conservative when evaluating OOD regions, which constrains the $Q$-function generalization. This over-constraint issue results in poor $Q$-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better $Q$-value estimation by enhancing $Q$-function generalization in OOD regions within Convex Hull and its Neighborhood (CHN). Under the safety generalization guarantees of the CHN, we propose the Smooth Bellman Operator (SBO), which updates OOD $Q$-values by smoothing them with neighboring in-sample $Q$-values. We theoretically show that SBO approximates true $Q$-values for both in-sample and OOD actions within the CHN. Our practical algorithm, Smooth Q-function OOD Generalization (SQOG), empirically alleviates the over-constraint issue, achieving near-accurate $Q$-value estimation. On the D4RL benchmarks, SQOG outperforms existing state-of-the-art methods in both performance and computational efficiency. Code is available at .

Live content is unavailable. Log in and register to view live content