Skip to yearly menu bar Skip to main content


Workshop

Evaluating visual "common sense" using fine-grained classification and captioning tasks

Raghav Goyal · Farzaneh Mahdisoltani · Guillaume Berger · Waseem Gharbieh · Ingo Bax · Roland Memisevic

East Meeting Level 8 + 15 #5

Tue 1 May, 4:30 p.m. PDT

We introduce the Something-something V2 dataset, which contains captions of finely-varying human-object interactions. We also discuss various baseline models, and show that neural networks show surprisingly strong performance on many of the very hard, detailed discrimination tasks associated with this dataset.

Live content is unavailable. Log in and register to view live content