‘Affordances’ for Machine Learning
Affordances are how the features of a technology shape, but do not determine, the uses and effects of that technology. In this address, I will demonstrate the value of an affordance framework for the analysis and design of ML systems. Specifically, I will delineate and apply the mechanisms and conditions framework of affordance, which models the way technologies request, demand, encourage, discourage, refuse, and allow technical and social outcomes. Illustrated through a case example that traverses critical analysis of an ML systems and its imagined (re)making, the mechanisms and conditions framework lays bare not just that technical choices are profoundly social, but also how and for whom. This approach displaces vagaries and general claims with the particularities of systems in context, empowering critically minded practitioners while holding power—and the systems power relations produce—to account.
In this talk, we present recent progress on large-scale learning of multimodal video representations. We start by presenting VideoBert, a joint model for video and language, repurposing the Bert model for multimodal data. This model achieves state-of-the-art results on zero shot prediction and video captioning. Next, we present an approach for video question answering which relies on training from instruction videos and cross-modal supervision with a textual question answer module. We show state-of-the-art results for video question answering without any supervision (zero-shot VQA) and demonstrate that our approach obtains competitive results for pre-training and then fine-tuning on video question answering datasets. We conclude our talk by presenting the recent VideoCC dataset, which transfers image captions to video and allows obtaining state-of-the-art performance for zero-shot video and audio retrieval and video captioning.
ICLR selected paper discussion with Data Skeptic & PyData
ICLR selected paper discussion with Data Skeptic & PyData