Moderator: Angel Chang
As humans we take the ability to perceive the dynamic world around us in three dimensions for granted. From an early age we can grasp an object by adapting our fingers to its 3D shape; understand our mother’s feelings by interpreting her facial expressions; or effortlessly navigate through a busy street. All these tasks require some internal 3D representation of shape, deformations, and motion. Building algorithms that can emulate this level of human 3D perception, using as input single images or video sequences taken with a consumer camera, has proved to be an extremely hard task. Machine learning solutions have faced the challenge of the scarcity of 3D annotations, encouraging important advances in weak and self-supervision. In this talk I will describe progress from early optimization-based solutions that captured sequence-specific 3D models with primitive representations of deformation, towards recent and more powerful 3D-aware neural representations that can learn the variation of shapes and textures across a category and be trained from 2D image supervision only. There has been very successful recent commercial uptake of this technology and I will show exciting applications to AI-driven video synthesis.