Imitation from Observations with Trajectory-Level Generative Embeddings
Abstract
We consider the offline imitation learning from observations (LfO), where expert demonstrations are scarce and contain only state observations, and the suboptimal policy is far from expert behavior. In this regime, many existing imitation learning approaches struggle to extract useful information from imperfect data since they impose strict support constraints and rely on brittle one-step models. To tackle this challenge, we propose Trajectory-level Generative Embedding (TGE) for offline LfO. TGE constructs a dense, smooth surrogate reward by using particle based entropy estimation to maximize the log-likelihood of expert trajectories in the latent space of a temporal diffusion model trained on offline suboptimal data. By leveraging the structured geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap under severe support mismatch, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.