Mark has spent the last year developing a state-of-the-art neural network for his favourite domain. But after deployment, everyone starts complaining. But surely they really should stop using it in settings that different from Mark's training scenario? Robustness matters. Any machine learning method needs to be broadly realistically applicable, must specify the domain of application, and must work across that domain. The restriction that that test environment match the training setting is neither well defined nor realistically applicable. Hence it is our responsibility to deal with domain shift, and all that entails. But why is it so hard? Why do our models break when we try to use them in the real world? Why do neural networks seem particularly susceptible to this? How do we understand and mitigate these issues? And what tools are at our disposal to build models that deal better with domain shift from the outset? We take a causal look at domain shift and look at approaches that enable improved capability when training and test domains are different.