Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation
Kai Xu · Georgi Ganev · Emile Joubert · Rees Davison · Olivier Van Acker · Luke Robinson
Abstract
Synthetic data generation (SDG) has become a popular approach to release private datasets.In SDG, a generative model is fitted on the private real data, and samples drawn from the model are released as the protected synthetic data.While real-world datasets usually consist of multiple tables with potential \emph{many-to-many} relationships (i.e.~\emph{many-to-many datasets}), recent research in SDG mostly focuses on modeling tables \emph{independently} or only considers generating datasets with special cases of many-to-many relationships such as \emph{one-to-many}.In this paper, we first study challenges of building faithful generative models for many-to-many datasets, identifying limitations of existing methods.We then present a novel factorization for many-to-many generative models, which leads to a scalable generation framework by combining recent results from random graph theory and representation learning.Finally, we extend the framework to establish the notion of $(\epsilon,\delta)$-differential privacy.Through a real-world dataset, we demonstrate that our method can generate synthetic datasets while preserving information within and across tables better than its closest competitor.
Video
Chat is not available.
Successful Page Load