Skip to yearly menu bar Skip to main content


Poster

Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning

HyunKyu Lee · Sung Whan Yoon

Hall 3 + Hall 2B #399
[ ] [ Project Page ]
Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT
 
Oral presentation: Oral Session 2A
Thu 24 Apr 12:30 a.m. PDT — 2 a.m. PDT

Abstract:

Investigating flat minima on loss surfaces in parameter space is well-documented in the supervised learning context, highlighting its advantages for model generalization. However, limited attention has been paid to the reinforcement learning (RL) context, where the impact of flatter reward landscapes in policy parameter space remains largely unexplored. Beyond merely extrapolating from supervised learning, which suggests a link between flat reward landscapes and enhanced generalization, we aim to formally connect the flatness of the reward surface to the robustness of RL models. In policy models where a deep neural network determines actions, flatter reward landscapes in response to parameter perturbations lead to consistent rewards even when actions are perturbed. Moreover, robustness to actions further contributes to robustness against other variations, such as changes in state transition probabilities and reward functions. We extensively simulate various RL environments, confirming the consistent benefits of flatter reward landscapes in enhancing the robustness of RL under diverse conditions, including action selection, transition dynamics, and reward functions. The code is available at https://github.com/HK-05/flatreward-RRL.

Live content is unavailable. Log in and register to view live content