Skip to yearly menu bar Skip to main content


In reinforcement learning, all objective functions are not equal

Romain Laroche · Harm van Seijen

East Meeting Level 8 + 15 #3

Wed 2 May, 4:30 p.m. PDT

We study the learnability of value functions. We get the reward back propagation out of the way by fitting directly a deep neural network on the analytically computed optimal value function, given a chosen objective function. We show that some objective functions are easier to train than others by several magnitude orders. We observe in particular the influence of the γ parameter and the decomposition of the task into subtasks.

Live content is unavailable. Log in and register to view live content