firstbacksecondback
2 Results
Poster
|
Wed 14:30 |
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy Yuan Xie · Boyi Liu · Qiang Liu · Zhaoran Wang · Yuan Zhou · Jian Peng |
|
Poster
|
Wed 9:00 |
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search Lars Buesing · Theophane Weber · Yori Zwols · Nicolas Heess · Sebastien Racaniere · Arthur Guez · Jean-Baptiste Lespiau |