Jessica Forde (Brown University); A. Feder Cooper (Cornell University); Michael L. Littman (Brown University)
Algorithmic fairness has emphasized the role of biased data in unfair automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep learning on real-world medical imaging data, we verify our claim empirically and argue that commonly-used benchmark datasets can conceal this issue.