Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 5th Workshop on practical ML for limited/low resource settings (PML4LRS) @ ICLR 2024

Text2Data: Low-Resource Data Generation with Textual Control

Shiyu Wang · Yihao Feng · Tian Lan · Ning Yu · Yu Bai · Ran Xu · Huan Wang · Caiming Xiong · Silvio Savarese


Abstract:

The machine learning community has been investing considerable effort in generating data that is semantically coherent with textual instructions. Nevertheless, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model and then undergoes controllable finetuning via a novel constraint optimization-based learning objective to ensure controllability. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.

Chat is not available.