Large Depth Completion Model from Sparse Observations
Abstract
This work presents the Large Depth Completion Model (LDCM), a simple, effective, and robust framework for single-view metric depth estimation with sparse observations. Without relying on complex architectural designs, LDCM generates metric-accurate dense depth maps in one large transformer. It outperforms existing approaches across diverse datasets and sparse observations. We achieve this from two key perspectives: (1) maximizing the potential of existing monocular foundation models to improve sparse observations preprocessing, and (2) reformulating training objectives to better capture geometric structure and metric consistency. Specifically, a Poisson-based depth initialization module is firstly introduced to generate a uniform coarse dense depth map from diverse sparse observations, which serves as a strong structural prior for the network. Regarding the training objective, we replace the conventional depth head with a point map head that regresses per-pixel 3D coordinates in camera space, enabling the model to directly learn the underlying 3D scene structure instead of performing pixel-wise depth map restoration. Moreover, this design eliminates the need for camera intrinsic parameters, allowing LDCM to naturally produce metric-scaled 3D point maps. Extensive experiments demonstrate that LDCM consistently outperforms state-of-the-art methods across multiple benchmarks and varying sparsity priors in both depth completion and point map estimation, showcasing its effectiveness and strong generalization to unseen data distributions.