Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4MAT-ICLR-2025: AI for Accelerated Materials Design

Detecting Symmetry-Breaking in Molecular Data Distributions

Hannah Lawrence · Elyssa Hofgard · Yuxuan Chen · Tess Smidt · Robin Walters

Keywords: [ canonicalization ] [ symmetry breaking ] [ classifier test ] [ symmetry ] [ equivariance ] [ augmentation ] [ distribution shift ]


Abstract:

Equivariant models, which enforce physical symmetries (such as rotations and permutations), have proven very successful at materials science tasks. The usual justification for this success is that symmetry transformations relate data samples, which improves generalization and data efficiency. However, this explanation assumes that transformed versions of a given molecule are highly likely under the data distribution. In this work, we develop a method for testing this assumption by measuring the amount of symmetry in a data distribution. Specifically, we propose a two-sample classifier test which distinguishes between the original dataset and its randomly augmented symmetrization. Unlike existing tests of group invariance, our method does not require defining an appropriate parametric test or kernel. We find that in commonly used materials science datasets such as QM9 and MD17, the orientations of molecules are highly non-uniform. Our findings suggest the success of equivariant models on these datasets may depend on other inductive biases, such as local equivariance. Moreover, non-equivariant models may be strongly benefiting from canonicalization of the molecules’ orientations, an oft-overlooked part of the data generation process. As machine learning be- comes increasingly important for materials discovery, it is essential to have tools to critically evaluate the assumptions underlying our data.

Chat is not available.