( events)   Timezone: »  
The 2021 schedule is still incomplete Program Highlights »
Fri May 07 07:00 AM -- 02:30 PM (PDT)
Synthetic Data Generation: Quality, Privacy, Bias
Sergul Aydore · Krishnaram Kenthapadi · Haipeng Chen · Edward Choi · Jamie Hayes · Mario Fritz · Rachel Cummings · Krishnaram Kenthapadi

Data are the most valuable ingredient of machine learning models to help researchers and companies make informed decisions. However, access to rich, diverse, and clean datasets may not always be possible. One of the reasons for the lack of rich datasets is the substantial amount of time needed for data collection, especially when manual annotation is required. Another reason is the need for protecting privacy, whenever raw data contains sensitive information about individuals and hence cannot be shared directly. A powerful solution that can address both of these challenging scenarios is generating synthetic data. Thanks to the recent advances in generative models, it is possible to create realistic synthetic samples that closely match the distribution of complex, real data. In the case of limited labeled data, synthetic data can be used to augment training data to mitigate overfitting. In the case of protecting privacy, data curators can share the synthetic data instead of the original data, where the utility of the original data is preserved but privacy is protected. Despite the substantial benefits from using synthetic data, the process of synthetic data generation is still an ongoing technical challenge. Although the two scenarios of limited data and privacy concerns share similar technical challenges such as quality and fairness, they are often studied separately. We will bring together researchers from both fields in order to discuss challenges and advances in synthetic data generation.

Opening Remarks (Remark)
"Can Machine Learning Revolutionize Healthcare? Synthetic Data may be the Answer" by Mihaela van der Schaar, UCLA (Invited Talk)
Q&A with Mihaela van der Schaar (Q&A)
Introducing contributed talks 1-2 (Intro)
Contributed Talk: Synthetic Data for Model selection (Contributed Talk)
Contributed Talk: Ensembles of GANs for synthetic training data generation (Contributed Talk)
Intoducing Jan Kautz (Intro)
"Generative Models for Image Synthesis" by Jan Kautz, NVIDIA (Invited Talk)
Q&A with Jan Kautz (Q&A)
Break + Posters (GatherTown)
Intoducing Jinsung Yoon (Intro)
"Differentially Private Synthetic Data Generations Using Generative Adversarial Networks" by Jinsung Yoon, Google Cloud AI (Invited Talk)
Q&A with Jinsung Yoon (Q&A)
Introducing contributed talks 3-4 (Intro)
Contributed Talk: Few-shot learning via tensor hallucination (Contributed Talk)
Contributed Talk: Leveraging Public Data for Practical Private Query Release (Contributed Talk)
Introducing Manuela M. Veloso (Intro)
"Towards Financial Synthetic Data" by Manuela M. Veloso, J.P.Morgan, CMU (Invited Talk)
Q&A with Manuela M. Veloso (Q&A)
Break + Posters (GatherTown)
Introducing Stefano Ermon (Intro)
"Bias and Generalization of Deep Generative Models" by Stefano Ermon, Stanford University (Invited Talk)
Q&A with Stefano Ermon (Q&A)
Introducing contributed talks 5-6-7 (Intro)
Contributed Talk: FFPDG: Fast, Fair and Private Data Generation (Contributed Talk)
Contributed Talk: Overcoming Barriers to Data Sharing with Medical Image Generation: A Comprehensive Evaluation (Contributed Talk)
Contributed Talk: Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data (Contributed Talk)
Intoducing Sander Dieleman (Intro)
"Generative Modeling for Music Generation" by Sander Dieleman, DeepMind (Invited Talk)
Q&A with Sander Dieleman (Q&A)
Break + Posters (GatherTown)
Introducing Emily Denton (Intro)
"Ethical Considerations of Generative AI" by Emily Denton, Google’s Ethical AI team (Invited Talk)
Q&A with Emily Denton (Q&A)
Discussion Panel by All invited speakers (Discussion Panel)
Closing Remarks and Award Ceremony (Remark)