Skip to yearly menu bar Skip to main content


JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling

Jingyang Zhang · Shiwei Li · Yuanxun Lu · Tian Fang · David McKinnon · Yanghai Tsin · Long Quan · Yao Yao

Halle B #81
[ ]
Tue 7 May 7:30 a.m. PDT — 9:30 a.m. PDT


We introduce JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). JointNet is extended from a pre-trained text-to-image diffusion model, where a copy of the original network is created for the new dense modality branch and is densely connected with the RGB branch. The RGB branch is locked during network fine-tuning, which enables efficient learning of the new modality distribution while maintaining the strong generalization ability of the large-scale pre-trained diffusion model.We demonstrate the effectiveness of JointNet by using the RGB-D diffusion as an example and through extensive experiments, showcasing its applicability in a variety of applications, including joint RGB-D generation, dense depth prediction, depth-conditioned image generation, and high-resolution 3D panorama generation.

Chat is not available.