Despite recent successes, deep learning systems are still limited by their lack of generalization. I'll present an approach to addressing this limitation which combines probabilistic, model-based learning, symbolic learning and deep learning. My work centers around probabilistic programming which is a powerful abstraction layer that separates Bayesian modeling and inference. In the first part of the talk, I’ll describe “inference compilation”, an approach to amortized inference in universal probabilistic programs. In the second part of the talk, I’ll introduce a family of wake-sleep algorithms for learning model parameters. Finally, I’ll introduce a neurosymbolic generative model called “drawing out of distribution”, or DooD, which allows for out of distribution generalization for drawings.