Poster
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation
Haruka Kiyohara · Ren Kishimoto · Kosuke Kawakami · Ken Kobayashi · Kazuhide Nakata · Yuta Saito
Halle B #159
Abstract:
**Off-Policy Evaluation (OPE)** aims to assess the effectiveness of counterfactual policies using offline logged data and is frequently utilized to identify the top-kk promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff and *efficiency* in subsequent online policy deployment. To address this issue, we draw inspiration from portfolio evaluation in finance and develop a new metric, called **SharpeRatio@k**, which measures the risk-return tradeoff and efficiency of policy portfolios formed by an OPE estimator under varying online evaluation budgets (kk). We first demonstrate, in two example scenarios, that our proposed metric can clearly distinguish between conservative and high-stakes OPE estimators and reliably identify the most *efficient* estimator capable of forming superior portfolios of candidate policies that maximize return with minimal risk during online deployment, while existing evaluation metrics produce only degenerate results. To facilitate a quick, accurate, and consistent evaluation of OPE via SharpeRatio@k, we have also implemented the proposed metric in an open-source software. Using SharpeRatio@k and the software, we conduct a benchmark experiment of various OPE estimators regarding their risk-return tradeoff, presenting several future directions for OPE research.
Chat is not available.