Synthetic Data Generation: Quality, Privacy, Bias

Workshop

Synthetic Data Generation: Quality, Privacy, Bias

Sergul Aydore · Krishnaram Kenthapadi · Haipeng Chen · Edward Choi · Jamie Hayes · Mario Fritz · Rachel Cummings · Krishnaram Kenthapadi

Fri 7 May, 7 a.m. PDT

[ Abstract ] Workshop Website

Data are the most valuable ingredient of machine learning models to help researchers and companies make informed decisions. However, access to rich, diverse, and clean datasets may not always be possible. One of the reasons for the lack of rich datasets is the substantial amount of time needed for data collection, especially when manual annotation is required. Another reason is the need for protecting privacy, whenever raw data contains sensitive information about individuals and hence cannot be shared directly. A powerful solution that can address both of these challenging scenarios is generating synthetic data. Thanks to the recent advances in generative models, it is possible to create realistic synthetic samples that closely match the distribution of complex, real data. In the case of limited labeled data, synthetic data can be used to augment training data to mitigate overfitting. In the case of protecting privacy, data curators can share the synthetic data instead of the original data, where the utility of the original data is preserved but privacy is protected. Despite the substantial benefits from using synthetic data, the process of synthetic data generation is still an ongoing technical challenge. Although the two scenarios of limited data and privacy concerns share similar technical challenges such as quality and fairness, they are often studied separately. We will bring together researchers from both fields in order to discuss challenges and advances in synthetic data generation.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 7:00 a.m. - 7:10 a.m.	Opening Remarks ( Remark ) >	Sergul Aydore 🔗
Fri 7:10 a.m. - 7:35 a.m.	"Can Machine Learning Revolutionize Healthcare? Synthetic Data may be the Answer" by Mihaela van der Schaar, UCLA ( Invited Talk ) > SlidesLive Video	Mihaela van der Schaar 🔗
Fri 7:35 a.m. - 7:40 a.m.	Q&A with Mihaela van der Schaar ( Q&A ) >	🔗
Fri 7:40 a.m. - 7:42 a.m.	Introducing contributed talks 1-2 ( Intro ) >	Jamie Hayes 🔗
Fri 7:42 a.m. - 7:51 a.m.	Contributed Talk: Synthetic Data for Model selection ( Contributed Talk ) > SlidesLive Video	Nadav Bhonker · Alon Shoshan 🔗
Fri 7:51 a.m. - 8:00 a.m.	Contributed Talk: Ensembles of GANs for synthetic training data generation ( Contributed Talk ) > SlidesLive Video	Gabriel Eilertsen 🔗
Fri 8:00 a.m. - 8:01 a.m.	Intoducing Jan Kautz ( Intro ) >	Jamie Hayes 🔗
Fri 8:01 a.m. - 8:25 a.m.	"Generative Models for Image Synthesis" by Jan Kautz, NVIDIA ( Invited Talk ) > SlidesLive Video	Jan Kautz 🔗
Fri 8:25 a.m. - 8:30 a.m.	Q&A with Jan Kautz ( Q&A ) >	🔗
Fri 8:30 a.m. - 9:00 a.m.	Break + Posters ( GatherTown ) > link Link	🔗
Fri 9:00 a.m. - 9:01 a.m.	Intoducing Jinsung Yoon ( Intro ) >	Edward Choi 🔗
Fri 9:01 a.m. - 9:25 a.m.	"Differentially Private Synthetic Data Generations Using Generative Adversarial Networks" by Jinsung Yoon, Google Cloud AI ( Invited Talk ) > SlidesLive Video	Jinsung Yoon 🔗
Fri 9:25 a.m. - 9:30 a.m.	Q&A with Jinsung Yoon ( Q&A ) >	🔗
Fri 9:30 a.m. - 9:32 a.m.	Introducing contributed talks 3-4 ( Intro ) >	Jamie Hayes 🔗
Fri 9:32 a.m. - 9:41 a.m.	Contributed Talk: Few-shot learning via tensor hallucination ( Contributed Talk ) > SlidesLive Video	Michalis Lazarou 🔗
Fri 9:41 a.m. - 9:50 a.m.	Contributed Talk: Leveraging Public Data for Practical Private Query Release ( Contributed Talk ) > SlidesLive Video	Terrance Liu 🔗
Fri 9:50 a.m. - 9:51 a.m.	Introducing Manuela M. Veloso ( Intro ) >	Sergul Aydore 🔗
Fri 9:51 a.m. - 10:15 a.m.	"Towards Financial Synthetic Data" by Manuela M. Veloso, J.P.Morgan, CMU ( Invited Talk ) >	Manuela Veloso 🔗
Fri 10:15 a.m. - 10:20 a.m.	Q&A with Manuela M. Veloso ( Q&A ) >	🔗
Fri 10:20 a.m. - 10:50 a.m.	Break + Posters ( GatherTown ) > link Link	🔗
Fri 10:50 a.m. - 10:51 a.m.	Introducing Stefano Ermon ( Intro ) >	Krishnaram Kenthapadi 🔗
Fri 10:51 a.m. - 11:15 a.m.	"Bias and Generalization of Deep Generative Models" by Stefano Ermon, Stanford University ( Invited Talk ) > SlidesLive Video	Stefano Ermon 🔗
Fri 11:15 a.m. - 11:20 a.m.	Q&A with Stefano Ermon ( Q&A ) >	🔗
Fri 11:20 a.m. - 11:23 a.m.	Introducing contributed talks 5-6-7 ( Intro ) >	Haipeng Chen 🔗
Fri 11:23 a.m. - 11:32 a.m.	Contributed Talk: FFPDG: Fast, Fair and Private Data Generation ( Contributed Talk ) > SlidesLive Video	Weijie Xu 🔗
Fri 11:32 a.m. - 11:41 a.m.	Contributed Talk: Overcoming Barriers to Data Sharing with Medical Image Generation: A Comprehensive Evaluation ( Contributed Talk ) > SlidesLive Video	Stefan Bauer · August DuMont Schütte 🔗
Fri 11:41 a.m. - 11:50 a.m.	Contributed Talk: Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data ( Contributed Talk ) > SlidesLive Video	Alberto Olmo · Niharika Jain 🔗
Fri 11:50 a.m. - 11:51 a.m.	Intoducing Sander Dieleman ( Intro ) >	Haipeng Chen 🔗
Fri 11:51 a.m. - 12:15 p.m.	"Generative Modeling for Music Generation" by Sander Dieleman, DeepMind ( Invited Talk ) > SlidesLive Video	Sander Dieleman 🔗
Fri 12:15 p.m. - 12:20 p.m.	Q&A with Sander Dieleman ( Q&A ) >	🔗
Fri 12:20 p.m. - 12:50 p.m.	Break + Posters ( GatherTown ) > link Link	🔗
Fri 12:50 p.m. - 12:51 p.m.	Introducing Emily Denton ( Intro ) >	Krishnaram Kenthapadi 🔗
Fri 12:51 p.m. - 1:15 p.m.	"Ethical Considerations of Generative AI" by Emily Denton, Google’s Ethical AI team ( Invited Talk ) > SlidesLive Video	Emily Denton 🔗
Fri 1:15 p.m. - 1:20 p.m.	Q&A with Emily Denton ( Q&A ) >	🔗
Fri 1:20 p.m. - 2:20 p.m.	Discussion Panel by All invited speakers ( Discussion Panel ) >	Mario Fritz 🔗
Fri 2:20 p.m. - 2:30 p.m.	Closing Remarks and Award Ceremony ( Remark ) >	Jamie Hayes 🔗