Skip to yearly menu bar Skip to main content


Poster

GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation

Yushi LAN · Shangchen Zhou · Zhaoyang Lyu · Fangzhou Hong · Shuai Yang · Bo DAI · Xingang Pan · Chen Change Loy

Hall 3 + Hall 2B #98
[ ] [ Project Page ]
Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Recent advancements in diffusion models and large-scale datasets have revolutionized image and video generation, with increasing focus on 3D content generation. While existing methods show promise, they face challenges in input formats, latent space structures, and output representations. This paper introduces a novel 3D generation framework that addresses these issues, enabling scalable and high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our approach utilizes a VAE with multi-view posed RGB-D-N renderings as input, features a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent flow-based model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single-view image inputs. Experimental results demonstrate superior performance on various datasets, advancing the state-of-the-art in 3D content generation.

Live content is unavailable. Log in and register to view live content