Poster
Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images
Canfer Akbulut · Kevin Robinson · Maribeth Rauh · Isabela Albuquerque · Olivia Wiles · Laura Weidinger · Verena Rieser · Yana Hasson · Nahema Marchal · Iason Gabriel · William Isaac · Lisa Hendricks
How do multi-modal generative models describe images of recent historical events and figures, whose legacies may be nuanced, multifaceted, or contested? This task necessitates not only accurate visual recognition, but also socio-cultural knowledge and cross-modal reasoning. To address this evaluation challenge, we introduce Century -- a novel dataset of sensitive historical images. This dataset consists of 1,500 images from recent history, created through an automated method combining knowledge graphs and language models with quality and diversity criteria created from the practices of museums and digital archives. We demonstrate through automated and human evaluation that this method produces a set of images that depict events and figures that are diverse across topics and represents all regions of the world.We additionally propose an evaluation framework for evaluating the historical contextualisation capabilities along dimensions of accuracy, thoroughness, and objectivity. We demonstrate this approach by using Century to evaluate four foundation models, scoring performance using both automated and human evaluation. We find that historical contextualisation of sensitive images poses a significant challenge for modern multi-modal foundation models, and offer practical recommendations for how developers can use Century to evaluate improvements to models and applications.
Live content is unavailable. Log in and register to view live content