The ability to perceive scenes in terms of abstract entities is crucial for us toachieve higher-level intelligence. Recently, several methods have been proposedto learn object-centric representations of scenes with multiple objects, yet mostof which focus on static scenes. In this paper, we work on object dynamics andpropose Object Dynamics Distillation Network (ODDN), a framework that distillates explicit object dynamics (e.g., velocity) from sequential static representations. ODDN also builds a relation module to model object interactions. We verifyour approach on tasks of video reasoning and video prediction, which are two important evaluations for video understanding. The results show that the reasoningmodel with visual representations of ODDN performs better in answering reasoning questions around physical events in a video compared to the previous state-of-the-art methods. The distilled object dynamics also could be used to predictfuture video frames given two input frames, involving occlusion and objects collision. In addition, our architecture brings better segmentation quality and higherreconstruction accuracy.