Skip to yearly menu bar Skip to main content


Contributed Talk
in
Workshop: Safe Machine Learning: Specification, Robustness, and Assurance

Misleading meta-objectives and hidden incentives for distributional shift

David Krueger


Abstract:

David Krueger, Tegan Maharaj, Shane Legg and Jan Leike.

Decisions made by machine learning systems have a tremendous influence on the world. Yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in online learning for applications such as content recommendation, where the (choice of) content displayed can change users’ perceptions and preferences, or even drive them away, causing a shift in the distribution of users. A large body of work in reinforcement learning and causal machine learning aims to account for distributional shift caused by deploying a learning system previously trained offline. Our goal is similar, but distinct: we point out that online training with meta-learning can create a hidden incentive for a learner to cause distributional shift. We design a simple environment to test for these hidden incentives (HIDS), demonstrate the potential for this phenomenon to cause unexpected or undesirable behavior, and propose and validate a mitigation strategy.

Chat is not available.