Workshop
Physics for Machine Learning
T. Konstantin Rusch · Aditi Krishnapriyan · Emmanuel de Bézenac · Ben Chamberlain · Elise van der Pol · Patrick Kidger
MH1
Combining physics with machine learning is a rapidly growing field of research. Thus far, most of the work in this area focuses on leveraging recent advances in classical machine learning to solve problems that arise in the physical sciences. In this workshop, we wish to focus on a slightly less established topic, which is the converse: exploiting structures (or symmetries) of physical systems as well as insights developed in physics to construct novel machine learning methods and gain a better understanding of such methods. A particular focus will be on the synergy between the scientific problems and machine learning and incorporating structure of these problems into the machine learning methods which are used in that context. However, the scope of application of those models is not limited to problems in the physical sciences and can be applied even more broadly to standard machine learning problems, e.g. in computer vision, natural language processing or speech recognition.
Schedule
Thu 12:00 a.m.  12:15 a.m.

Introduction and opening remarks
(
Introduction
)
>
SlidesLive Video 
T. Konstantin Rusch 🔗 
Thu 12:15 a.m.  12:40 a.m.

Physicsinspired learning on graphs
(
Invited talk
)
>
SlidesLive Video The messagepassing paradigm has been the “battle horse” of deep learning on graphs for several years, making graph neural networks a big success in a wide range of applications, from particle physics to protein design. From a theoretical viewpoint, it established the link to the WeisfeilerLehman hierarchy, allowing to analyse the expressive power of GNNs. We argue that the very “nodeandedge”centric mindset of current graph deep learning schemes may hinder future progress in the field. As an alternative, we propose physicsinspired “continuous” learning models that open up a new trove of tools from the fields of differential geometry, algebraic topology, and differential equations so far largely unexplored in graph ML. 
Michael Bronstein 🔗 
Thu 12:40 a.m.  12:45 a.m.

Q&A
(
Inperson Q&A
)
>

Michael Bronstein 🔗 
Thu 12:45 a.m.  12:55 a.m.

MultiScale Message Passing Neural PDE Solvers
(
Spotlight presentation
)
>
SlidesLive Video We propose a novel multiscale message passing neural network algorithm for learning the solutions of timedependent PDEs. Our algorithm possesses both temporal and spatial multiscale resolution features by incorporating multiscale sequence models and graph gating modules in the encoder and processor, respectively. Benchmark numerical experiments are presented to demonstrate that the proposed algorithm outperforms baselines, particularly on a PDE with a range of spatial and temporal scales. 
Léonard Equer 🔗 
Thu 12:55 a.m.  1:00 a.m.

Q&A
(
Inperson Q&A
)
>

Léonard Equer 🔗 
Thu 1:00 a.m.  2:00 a.m.

Poster session 1
(
Poster session
)
>

🔗 
Thu 2:00 a.m.  2:30 a.m.

Coffee break
(
Break
)
>

🔗 
Thu 2:30 a.m.  2:55 a.m.

Learned Models for Physical Simulation and Design
(
Invited talk
)
>
Simulation is important for countless applications in science and engineering, and there has been increasing interest in using machine learning for efficiency in prediction and optimization. In the first part of the talk, I will describe our work on training learned models for efficient turbulence simulation. Turbulent fluid dynamics are chaotic and therefore hard to predict, and classical simulators typically require expertise to produce and take a long time to run. We found that learned CNNbased simulators can learn to efficiently capture diverse types of turbulent dynamics at low resolutions, and that they capture the dynamics of a highresolution classical solver more accurately than a classical solver run at the same low resolution. We also provide recommendations for producing stable rollouts in learned models, and improving generalization to outofdistribution states. In the second part of the talk, I will discuss work using learned simulators for inverse design. In this work, we combine Graph Neural Network (GNN) learned simulators [SanchezGonzalez et al 2020, Pfaff et al 2021] with gradientbased optimization in order to optimize designs in a variety of complex physics tasks. These include challenges designing objects in 2D and 3D to direct fluids in complex ways, as well as optimizing the shape of an airfoil. We find that the learned model can support design optimization across 100s of timesteps, and that the learned models can in some cases permit designs that lead to dynamics apparently quite different from the training data. 
Kimberly Stachenfeld 🔗 
Thu 2:55 a.m.  3:00 a.m.

Q&A
(
Zoom Q&A
)
>

Kimberly Stachenfeld 🔗 
Thu 3:00 a.m.  3:10 a.m.

SemiEquivariant Conditional Normalizing Flows
(
Spotlight presentation
)
>
We study the problem of learning conditional distributions of the form p(GG'), where G and G' are two 3D graphs, using continuous normalizing flows. We derive a semiequivariance condition on the flow which ensures that conditional invariance to rigid motions holds. We demonstrate the effectiveness of the technique in the molecular setting of receptoraware ligand generation. 
Eyal Rozenberg 🔗 
Thu 3:10 a.m.  3:15 a.m.

Q&A
(
Zoom Q&A
)
>

Eyal Rozenberg 🔗 
Thu 3:15 a.m.  5:00 a.m.

Lunch break
(
Break
)
>

🔗 
Thu 5:00 a.m.  5:25 a.m.

Physics Inspired Machine Learning
(
Invited talk
)
>
SlidesLive Video Physical systems, concepts and principles are increasingly being used in devising novel and robust machine learning architectures. We illustrate this point with examples from two ML domains: sequence modeling and graph representation learning. In both cases, we demonstrate how physical concepts such oscillators and multiscale dynamics can lead to ML architectures that not only mitigate problems that plague these learning tasks but also provide competitive performance. 
Siddhartha Mishra 🔗 
Thu 5:25 a.m.  5:30 a.m.

Q&A
(
Zoom Q&A
)
>

Siddhartha Mishra 🔗 
Thu 5:30 a.m.  5:40 a.m.

Neural Networks Learn Representation Theory: Reverse Engineering how Networks Perform Group Operations
(
Spotlight presentation
)
>
SlidesLive Video We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory, through learning several irreducible representations of the group and converting group composition to matrix multiplication. We show small networks consistently learn this algorithm when trained on composition of group elements by reverse engineering model logits and weights, and confirm our understanding using ablations. We use this as an algorithmic test bed for the hypothesis of universality in mechanistic interpretability  that different models learn similar features and circuits when trained on similar tasks. By studying networks trained on various groups and architectures, we find mixed evidence for universality: using our algorithm, we can completely characterize the family of circuits and features that networks learn on this task, but for a given network the precise circuits learned  as well as the order they develop  are arbitrary. 
Bilal Chughtai 🔗 
Thu 5:40 a.m.  5:45 a.m.

Q&A
(
Zoom Q&A
)
>

Bilal Chughtai 🔗 
Thu 5:45 a.m.  6:45 a.m.

Poster session 2
(
Poster session
)
>

🔗 
Thu 6:45 a.m.  7:15 a.m.

Coffee break
(
Break
)
>

🔗 
Thu 7:15 a.m.  7:40 a.m.

Bridging Biophysics and AI to Optimize Biology
(
Invited talk
)
>
SlidesLive Video The potential of artificial intelligence (AI) in biology is immense, yet its success is contingent on interfacing effectively with wetlab experimentation and remaining grounded in the system, structure, and physics of biology. In this talk, I will discuss how we have developed biophysically grounded AI algorithms for biomolecular design. I will share recent work in creating a diffusionbased generative model that designs protein structures by mirroring the biophysics of the native protein folding process. This work provides an example of how bridging AI with fundamental biophysics can accelerate design and discovery in biology, opening the door for sustained feedback and integration between the computational and biological sciences. 
Ava Soleimany 🔗 
Thu 7:40 a.m.  7:45 a.m.

Q&A
(
Zoom Q&A
)
>

Ava Soleimany 🔗 
Thu 7:45 a.m.  7:55 a.m.

Latent SDEs for Modelling Quasar Variability and Inferring Black Hole Properties
(
Spotlight presentation
)
>
SlidesLive Video Active galactic nuclei (AGN) are believed to be powered by the accretion of matter around supermassive black holes at the centers of galaxies. The variability of an AGN's brightness over time can reveal important information about the physical properties of the underlying black hole. The temporal variability is believed to follow a stochastic process, often represented as a damped random walk described by a stochastic differential equation (SDE). With upcoming widefield surveys set to observe 100 million AGN in multiple bandpass filters, there is a need for efficient and automated modeling techniques that can handle the large volume of data. Latent SDEs are wellsuited for modeling AGN time series data, as they can explicitly capture the underlying stochastic dynamics. In this work, we modify latent SDEs to jointly reconstruct the unobserved portions of multivariate AGN light curves and infer their physical properties such as the black hole mass. Our model is trained on a realistic physicsbased simulation of tenyear AGN light curves, and we demonstrate its ability to fit AGN light curves even in the presence of long seasonal gaps and irregular sampling across different bands, outperforming a multioutput Gaussian process regression baseline. 
Joshua Fagin 🔗 
Thu 7:55 a.m.  8:00 a.m.

Q&A
(
Zoom Q&A
)
>

Joshua Fagin 🔗 
Thu 8:00 a.m.  8:25 a.m.

Scaling laws for deep neural networks: driving theory and understanding through experimental insights
(
Invited talk
)
>
SlidesLive Video It has been observed that the performance of deep neural networks often empirically follows a powerlaw as simple scaling variables such as amount of training data and model parameters are changed. We would like to understand the origins behind these empirical observations. We take a physicist’s approach in investigating this question through the pillars of exactly solvable models, perturbation theory, and empiricallymotivated assumptions on natural data. By starting from a simple theoretical setting which is controlled, testing our predictions against experiments, and extrapolating to more realistic settings, we can propose a natural classification of scaling regimes that are driven by different underlying mechanisms. 
Yasaman Bahri 🔗 
Thu 8:25 a.m.  8:30 a.m.

Q&A
(
Zoom Q&A
)
>

Yasaman Bahri 🔗 
Thu 8:30 a.m.  8:45 a.m.

Learning protein family manifolds with smoothed energybased models
(
Spotlight presentation
)
>
SlidesLive Video We resolve difficulties in training and sampling from discrete energybased models (EBMs) by learning a smoothed energy landscape, sampling the smoothed data manifold with Langevin Markov chain Monte Carlo, and projecting back to the true data manifold with onestep denoising. Our formalism combines the attractive properties of EBMs and improved sample quality of scorebased models, while simplifying training and sampling by requiring only a single noise scale. We demonstrate the robustness of our approach on generative modeling of antibody proteins. 
Nathan Frey 🔗 
Thu 8:45 a.m.  9:00 a.m.

Closing remarks
(
Closing remarks
)
>
SlidesLive Video 
T. Konstantin Rusch 🔗 


The END: An Equivariant Neural Decoder for Quantum Error Correction
(
Poster
)
>
link
Quantum error correction is a critical component for scaling up quantum computing. Given a quantum code, an optimal decoder maps the measured code violations to the most likely error that occurred, but its cost scales exponentially with the system size.Neural network decoders are an appealing solution since they can learn from data an efficient approximation to such a mapping and can automatically adapt to the noise distribution.In this work, we introduce a data efficient neural decoder that exploits the symmetries of the problem. We characterize the symmetries of the optimal decoder for the toric code and propose a novel equivariant architecture that achieves state of the art accuracy compared to previous neural decoders. 
Evgenii Egorov · Roberto Bondesan · Max Welling 🔗 


Physicsdriven machine learning models coupling PyTorch and Firedrake
(
Poster
)
>
link
Partial differential equations (PDEs) are central to describing and modelling complex physical systems that arise in many disciplines across science and engineering. However, in many realistic applications PDE modelling provides an incomplete description of the physics of interest. PDEbased machine learning techniques are designed to address this limitation. In this approach, the PDE is used as an inductive bias enabling the coupled model to rely on fundamental physical laws while requiring less training data. The deployment of highperformance simulations coupling PDEs and machine learning to complex problems necessitates the composition of capabilities provided by machine learning and PDEbased frameworks. We present a simple yet effective coupling between the machine learning framework PyTorch and the PDE system Firedrake that provides researchers, engineers and domain specialists with a high productive way of specifying coupled models while only requiring trivial changes to existing code. 
Nacime Bouziani · David Ham 🔗 


E($3$) Equivariant Graph Neural Networks for ParticleBased Fluid Mechanics
(
Poster
)
>
link
We contribute to the vastly growing field of machine learning for engineering systems by demonstrating that equivariant graph neural networks have the potential to learn more accurate dynamicinteraction models than their nonequivariant counterparts. We benchmark two wellstudied fluid flow systems, namely the 3D decaying TaylorGreen vortex and the 3D reverse Poiseuille flow, and compare equivariant graph neural networks to their nonequivariant counterparts on different performance measures, such as kinetic energy or Sinkhorn distance. Such measures are typically used in engineering to validate numerical solvers. Our main findings are that while being rather slow to train and evaluate, equivariant models learn more physically accurate interactions. This indicates opportunities for future work towards coarsegrained models for turbulent flows, and generalization across system dynamics and parameters. 
Artur Toshev · Gianluca Galletti · Johannes Brandstetter · Stefan Adami · Nikolaus Adams 🔗 


$\mathrm{SE}(3)$ Frame Equivariance in Dynamics Modeling and Reinforcement Learning
(
Poster
)
>
link
In this paper, we aim to explore the potential of symmetries in improving the understanding of continuous control tasks in the 3D environment, such as locomotion. The existing work in reinforcement learning on symmetry focuses on pixellevel symmetries in 2D environments or is limited to valuebased planning. Instead, we considers continuous state and action spaces and continuous symmetry groups, focusing on translational and rotational symmetries.We propose a pipeline to use these symmetries in learning dynamics and control, with the goal of exploiting the underlying symmetry structure to improve dynamics modeling and modelbased planning. 
Linfeng Zhao · Jung Yeon Park · Xupeng Zhu · Robin Walters · Lawson Wong 🔗 


Learning the Dynamics of Physical Systems with Hamiltonian Graph Neural Networks
(
Poster
)
>
link
Inductive biases in the form of conservation laws have been shown to provide superior performance for modeling physical systems. Here, we present Hamiltonian graph neural network (HGNN), a physicsinformed GNN that learns the dynamics directly from the trajectory. We evaluate the performance of HGNN on spring, pendulum, and gravitational systems and show that it outperforms other Hamiltonianbased neural networks. We also demonstrate the zeroshot generalizability of HGNN to unseen hybrid springpendulum systems and system sizes that are two orders of magnitude larger than the training systems. HGNN provides excellent inference in all the systems providing a stable trajectory. Altogether, HGNN presents a promising approach to modeling complex physical systems directly from their trajectory. 
Suresh Bishnoi · Ravinder Bhattoo · Jayadeva Jayadeva · Sayan Ranu · N. M. Anoop Krishnan 🔗 


Latent Sequence Generation of Steered Molecular Dynamics
(
Poster
)
>
link
In this paper, we use a LSTMVAE model framework in order to learn latent representations that are conditioned by potential energy through TorchMD, while being able to autoregressively generate sequences of a 10 decaalanine system. While previous work have used generative deep learning methods for learning latent representations and predicting motion of molecules, this paper tackles with the latent representations for steered molecular dynamics (SMD). 
John Kevin Cava · Ankita Shukla · John Vant · Shubhra Kanti Karmaker Santu · Pavan Turaga · Ross Maciejewski · Abhishek Singharoy 🔗 


Geometric constraints improve inference of sparsely observed stochastic dynamics
(
Poster
)
>
link
The dynamics of systems of many degrees of freedom evolving on multiple scales are often modeled in terms of stochastic differential equations. Usually the structural form of these equations is unknown and the only manifestation of the system's dynamics are observations at discrete points in time. Despite their widespread use, accurately inferring these systems from sparseintime observations remains challenging. Conventional inference methods either focus on the temporal structure of observations, neglecting the geometry of the system's invariant density, or use geometric approximations of the invariant density, which are limited to conservative driving forces. To address these limitations, here, we introduce a novel approach that reconciles these two perspectives. We propose a path augmentation scheme that employs datadriven control to account for the geometry of the invariant system's density. Nonparametric inference on the augmented paths, enables efficient identification of the underlying deterministic forces of systems observed at low sampling rates. 
Dimitra Maoutsa 🔗 


Fast computation of permutation equivariant layers with the partition algebra
(
Poster
)
>
link
Linear neural network layers that are either equivariant or invariant to permutations of their inputs form core building blocks of modern deep learning architectures. Examples include the layers of DeepSets, as well as linear layers occurring in attention blocks of transformers and some graph neural networks. The space of permutation equivariant linear layers can be identified as the invariant subspace of a certain symmetric group representation, and recent work parameterized this space by exhibiting a basis whose vectors are sums over orbits of standard basis elements with respect to the symmetric group action. A parameterization opens up the possibility of learning the weights of permutation equivariant linear layers via gradient descent. The space of permutation equivariant linear layers is a generalization of the partition algebra, an object first discovered in statistical physics with deep connections to the representation theory of the symmetric group, and the basis described above generalizes the socalled orbit basis of the partition algebra. We exhibit an alternative basis, generalizing the diagram basis of the partition algebra, with computational benefits stemming from the fact that the tensors making up the basis are low rank in the sense that they naturally factorize into Kronecker products. Just as multiplication by a rank one matrix is far less expensive than multiplication by an arbitrary matrix, multiplication with these low rank tensors is far less expensive than multiplication with elements of the orbit basis. Finally, we describe an algorithm implementing multiplication with these basis elements. 
Charles Godfrey · Michael Rawson · Davis Brown · Henry Kvinge 🔗 


Predicting Fluid Dynamics in Physicalinformed Meshreduced Space
(
Poster
)
>
link
For computational fluid dynamics, there is a considerable interest in using neural networks for accelerating simulations. However, these learningbased models suffer from scalability issues when training on highdimensional and highresolution simulation data generated for realworld applications. In this work, we study the problem of improving accuracy of desired physical properties using graph learning models for learning complex fluid dynamics, while operating on meshreduced space. We design several tailored modules to incorporate physicalinformed knowledge into a twostage prediction model, which directs the learning process to focus more on the region of interest (ROI). Prediction will then be made in a meshreduced space, which helps reduce computational costs while preserving important physical properties. Results on simulated unsteady fluid flow data show that even under reduced operational space, our method still achieves desirable performance on accuracy and generalizability of both prediction and physical consistency over region of interests. 
Yeping Hu · Bo Lei · Victor Castillo 🔗 


Scientific Computing Algorithms to Learn Enhanced Scalable Surrogates for Mesh Physics
(
Poster
)
>
link
Datadriven modeling approaches can produce fast surrogates to study largescale physics problems. Among them, graph neural networks (GNNs) that operate on meshbased data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a class of GNN surrogates on 3D meshes. We scale MeshGraphNets (MGN), a subclass of GNNs for meshbased physics modeling, via our domain decompositionbased approach to facilitate training that is mathematically equivalent to training on the whole domain under certain conditions. With this, we were able to train MGN on meshes with \textit{millions} of nodes to generate computational fluid dynamics (CFD) simulations. Furthermore, we show how to enhance MGN via higherorder numerical integration, which can reduce MGN's error and training time. We validated our methods on an accompanying dataset of 3D $\text{CO}_2$capture CFD simulations on a 3.1Mnode mesh. This work presents a practical path to scaling MGN for realworld applications.

Brian Bartoldson · Yeping Hu · Amar Saini · Jose Cadena · Yucheng Fu · Jie Bao · Zhijie Xu · Brenda Ng · Phan Nguyen 🔗 


Latent Stochastic Differential Equations for Modeling Quasar Variability and Inferring Black Hole Properties
(
Poster
)
>
link
Active galactic nuclei (AGN) are believed to be powered by the accretion of matter around supermassive black holes at the centers of galaxies. The variability of an AGN's brightness over time can reveal important information about the physical properties of the underlying black hole. The temporal variability is believed to follow a stochastic process, often represented as a damped random walk described by a stochastic differential equation (SDE). With upcoming widefield surveys set to observe 100 million AGN in multiple bandpass filters, there is a need for efficient and automated modeling techniques that can handle the large volume of data. Latent SDEs are wellsuited for modeling AGN time series data, as they can explicitly capture the underlying stochastic dynamics. In this work, we modify latent SDEs to jointly reconstruct the unobserved portions of multivariate AGN light curves and infer their physical properties such as the black hole mass. Our model is trained on a realistic physicsbased simulation of tenyear AGN light curves, and we demonstrate its ability to fit AGN light curves even in the presence of long seasonal gaps and irregular sampling across different bands, outperforming a multioutput Gaussian process regression baseline. 
Joshua Fagin · Ji Won Park · Henry Best · Matt O'Dowd 🔗 


Multilevel Approach to Efficient Gradient Calculation in Stochastic Systems
(
Poster
)
>
link
Gradient estimation in Stochastic Differential Equations is a critical challenge in fields that require dynamic modeling of stochastic systems. While there have been numerous studies on pathwise gradients, the calculation of expectations over different realizations of the Brownian process in SDEs is occasionally not considered. Multilevel Monte Carlo offers a highly efficient solution to this problem, greatly reducing the computational cost in stochastic modeling and simulation compared to naive Monte Carlo. In this study, we utilized Neural Stochastic Differential Equations as our stochastic system and demonstrated that the accurate gradient could be effectively computed through the use of MLMC. 
Joohwan Ko · Michael Poli · Stefano Massaroli · Woo Chang Kim 🔗 


Studying Phase Transitions in Contrastive Learning With PhysicsInspired Datasets
(
Poster
)
>
link
In recent years contrastive learning has become a stateoftheart technique in representation learning, but the exact mechanisms by which it trains are not well understood. By focusing on physicsinspired datasets with low intrinsic dimensionality, we are able to visualize and study contrastive training procedures in better resolution. We empirically study the geometric development of contrastively learned embeddings, discovering phase transitions between locally metastable embedding conformations towards an optimal structure. Ultimately we show a strong experimental link between stronger augmentations and decreased training time for contrastively learning more geometrically meaningful representations. 
Ali Cy · Anugrah Chemparathy · Michael Han · Rumen R Dangovski · Peter Lu · Marin Soljacic 🔗 


SelfSupervised Learning with Lie Symmetries for Partial Differential Equations
(
Poster
)
>
link
Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn generalpurpose representations of PDEs from heterogeneous data by implementing joint embedding methods for selfsupervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the timestepping performance of neural solvers. Data augmentation is central to SSL: although simple augmentation strategies such as cropping provide satisfactory results, our inclusion of transformations corresponding to the symmetry group of a given PDE significantly improves the quality of the learned representations. 
Grégoire Mialon · Quentin Garrido · Hannah Lawrence · Danyal Rehman · Yann LeCun · Bobak Kiani 🔗 


Lorentz Group Equivariant Autoencoders
(
Poster
)
>
link
We develop the Lorentz group autoencoder (LGAE), an autoencoder that is equivariant with respect to the proper, orthochronous Lorentz group $\mathrm{SO}^+(3,1)$, with a latent space living in the representations of the group. We present our architecture and several experimental results on data at the Large Hadron Collider and find it outperforms a graph neural network baseline model on several compression, reconstruction, and anomaly detection tasks. The PyTorch code for our models is provided in Hao et al. (2022a).

Zichun Hao · Raghav Kansal · Javier Duarte · Nadya Chernyavskaya 🔗 


Relational Macrostate Theory Guides Artificial Intelligence to Learn Macro and Design Micro
(
Poster
)
>
link
A central focus of science is the identification and application of laws, which are often represented as macrostates that capture invariant properties associated with symmetries. However, complex systems can be challenging to study due to their highdimensionality, nonlinearity, and emergent properties. To address this challenge, we propose the relational macrostate theory (RMT) that defines macrostates in terms of symmetries between mutually predictive observations. Additionally, we have developed a machine learning architecture, MacroNet, that can learn these macrostates and invertibly sample from them, allowing for the design of new microstates consistent with conserved properties. By utilizing this framework, we have studied how macrostates can be identified in systems ranging from simple harmonic oscillators to complex spatial patterns known as Turing instabilities. Our results demonstrate how emergent properties can be designed by identifying the unbroken symmetries that give rise to invariants, bypassing Anderson's "more is different" by showing that "more is the same" in complex systems. 
Yanbo Zhang · Sara Walker 🔗 


Emulating Radiation Transport on Cosmological Scales using a Denoising UNet
(
Poster
)
>
link
Seminumerical simulations are the leading candidates for evolving reionization on cosmological scales. These seminumerical models are efficient in generating largescale maps of the 21cm signal, but they are too slow to enable inference at the field level. We present different strategies to train a UNet to accelerate these simulations. We derive the ionization field directly from the initial density fieldwithout using the ionizing sources' location, and hence emulating the radiative transfer process. We find that the UNet achieves higher accuracy in reconstructing the ionization field if the input includes either white noise or a noisy version of the ionization map beside the density field during training. Our model reconstructs the power spectrum over all scales perfectly well. This work represents a step towards generating largescale ionization maps with a minimal cost and hence enabling rapid parameter inference at the field level. 
Mosima Masipa · Hassan · Mario Santos · Kyunghyun Cho · Gabriella Contardo 🔗 


Learning to Suggest Breaks: Sustainable Optimization of LongTerm User Engagement
(
Poster
)
>
link
Optimizing user engagement is a key goal for modern recommendation systems, but blindly pushing users towards consumption entails risks. To promote digital wellbeing, most platforms now offer a service that periodically prompts users to take breaks. These, however, must be set up manually, and so may be suboptimal for both users and the system. In this paper, we study the role of breaks in recommendation, and propose a framework for learning optimal breaking policies that promote and sustain longterm engagement. Based on the notion that usersystem dynamics incorporate both positive and negative feedback, we cast recommendation as LotkaVolterra dynamics. We give an efficient learning algorithm, provide theoretical guarantees, and evaluate our approach on semisynthetic data. 
Eden Saig · Nir Rosenfeld 🔗 


Learning protein family manifolds with smoothed energybased models
(
Poster
)
>
link
We resolve difficulties in training and sampling from discrete energybased models (EBMs) by learning a smoothed energy landscape, sampling the smoothed data manifold with Langevin Markov chain Monte Carlo, and projecting back to the true data manifold with onestep denoising. Our formalism combines the attractive properties of EBMs and improved sample quality of scorebased models, while simplifying training and sampling by requiring only a single noise scale. We demonstrate the robustness of our approach on generative modeling of antibody proteins. 
Nathan Frey · Dan Berenberg · Joseph Kleinhenz · Isidro Hotzel · Julien LafranceVanasse · Ryan Kelly · Yan Wu · Arvind Rajpal · Stephen Ra · Richard Bonneau · Kyunghyun Cho · Andreas Loukas · Vladimir Gligorijevic · Saeed Saremi



Nature's Cost Function: Simulating Physics by Minimizing the Action
(
Poster
)
>
link
In physics, there is a scalar function called the action which behaves like a cost function. When minimized, it yields the "path of least action" which represents the path a physical system will take through space and time. This function is crucial in theoretical physics and is usually minimized analytically to obtain equations of motion for various problems. In this paper, we propose a different approach: instead of minimizing the action analytically, we discretize it and then minimize it directly with gradient descent. We use this approach to obtain dynamics for six different physical systems and show that they are nearly identical to groundtruth dynamics. We discuss failure modes such as the unconstrained energy effect and show how to address them. Finally, we use the discretized action to construct a simple but novel quantum simulation. Code: github.com/greydanus/ncf 
Samuel Greydanus · Timothy Strang · Isabella Caruso 🔗 


Symbolic Regression for PDEs using Pruned Differentiable Programs
(
Poster
)
>
link
Physicsinformed Neural Networks (PINNs) have been widely used to obtain accurate neural surrogates for a system of Partial Differential Equations (PDE). One of the major limitations of PINNs is that the neural solutions are challenging to interpret, and are often treated as blackbox solvers. While Symbolic Regression (SR) has been studied extensively, very few works exist which generate analytical expressions to directly perform SR for a system of PDEs. In this work, we introduce an endtoend framework for obtaining mathematical expressions for solutions of PDEs. We use a trained PINN to generate a dataset, upon which we perform SR. We use a Differentiable Program Architecture (DPA) defined using contextfree grammar to describe the space of symbolic expressions. We improve the interpretability by pruning the DPA in a depthfirst manner using the magnitude of weights as our heuristic. On average, we observe a 95.3% reduction in parameters of DPA while maintaining accuracy at par with PINNs. Furthermore, on an average, pruning improves the accuracy of DPA by 7.81% . We demonstrate our framework outperforms the existing stateoftheart SR solvers on systems of complex PDEs like NavierStokes: Kovasznay flow and TaylorGreen Vortex flow. Furthermore, we produce analytical expressions for a complex industrial usecase of an AirPreheater, without suffering from performance loss vizaviz PINNs. 
Ritam Majumdar · Vishal Jadhav · Anirudh Deodhar · Shirish Karande · Lovekesh Vig · Venkataramana Runkana 🔗 


Gaussian processes at the Helm(holtz): A more fluid model for ocean currents
(
Poster
)
>
link
Oceanographers are interested in predicting ocean currents and identifying divergences in a current vector field based on sparse observations of buoy velocities. Since we expect current dynamics to be smooth but highly nonlinear, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current prediction and divergence identification  due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curlfree components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method on synthetic and real ocean data. 
Renato Berlinghieri · Brian Trippe · David Burt · Ryan Giordano · Kaushik Srinivasan · Tamay Özgökmen · Junfei Zia · Tamara Broderick 🔗 


Grounding Graph Network Simulators using Physical Sensor Observations
(
Poster
)
>
link
Physical simulations that accurately model reality are crucial for many engineering disciplines such as mechanical engineering and robotic motion planning. In recent years, learned Graph Network Simulators produced accurate meshbased simulations while requiring only a fraction of the computational cost of traditional simulators. As these predictors have to simulate complex physical systems from only an initial state, they exhibit a high error accumulation for longterm predictions. In this work, we integrate sensory information to $\textit{ground}$ Graph Network Simulators on real world observations in the form of point clouds. The resulting model allows for accurate predictions over longer time horizons, even under uncertainties in the simulation, such as unknown material properties.

Jonas Linkerhägner · Niklas Freymuth · Paul Maria Scheikl · Franziska MathisUllrich · Gerhard Neumann 🔗 


MetaPhysiCa: OOD Robustness in Physicsinformed Machine Learning
(
Poster
)
>
link
A fundamental challenge in physicsinformed machine learning (PIML) is the design of robust PIML methods for outofdistribution (OOD) forecasting tasks. These OOD tasks require learningtolearn from observations of the same (ODE) dynamical system with different unknown ODE parameters, and demand accurate forecasts even under outofsupport initial conditions and outofsupport ODE parameters. We propose a solution for such tasks, defined as a metalearning procedure for causal structure discovery. In 3 different OOD tasks, we show that the proposed approach outperforms existing PIML and deep learning methods. 
S Chandra Mouli · Muhammad Alam · Bruno Ribeiro 🔗 


Learning to Initiate and Reason in EventDriven Cascading Processes
(
Poster
)
>
link
Training agents to control a dynamic environment is a fundamental task in AI. In many environments, the dynamics can be summarized by a small set of events that capture the semantic behavior of the system. Typically, these events form chains or cascades. We often wish to change the system behavior using a single intervention that propagates through the cascade. For instance, one may trigger a biochemical cascade to switch the state of a cell.We introduce a new learning setup called Cascade. An agent observes a system with known dynamics evolving from some initial state. The agent is given a structured semantic instruction and needs to make an intervention that triggers a cascade of events, such that the system reaches an alternative (counterfactual) behavior. We provide a test bed for this problem, consisting of physical objects. %This problem is hard because the cascades make search space highly fragmented and discontinuous. We devise an algorithm that learns to efficiently search in exponentially large semantic trees. We demonstrate that our approach learns to follow instructions to intervene in new complex scenes. 
Yuval Atzmon · Eli Meirom · Shie Mannor · Gal Chechik 🔗 


Expressive Sign Equivariant Networks for Spectral Geometric Learning
(
Poster
)
>
link
Recent work has shown the utility of developing machine learning models that respect the symmetries of eigenvectors. These works promote sign invariance, since for any eigenvector $v$ the negation $v$ is also an eigenvector.In this work, we demonstrate that sign equivariance is useful for applications such as building orthogonally equivariant models and link prediction. To obtain these benefits, we develop novel sign equivariant neural network architectures. These models are based on our analytic characterization of the sign equivariant polynomials and thus inherit provable expressiveness properties.

Derek Lim · Joshua Robinson · Stefanie Jegelka · Yaron Lipman · Haggai Maron 🔗 


Invertible mapping between fields in CAMELS
(
Poster
)
>
link
We build a bijective mapping between different fields from IllustrisTNG in CAMELS Project. In this work, we train a CycleGAN on three different setups: translating dark matter to neutral hydrogen (McdmHI), mapping between dark matter and magnetic fields magnitude (McdmB), and finally predicting magnetic fields magnitude from neutral hydrogen (HIB). We assess the performance of the models using various metrics such a probability distribution function (PDF) of the pixel values and 2D power spectrum ($P(k)$). Results suggest that in all setups, the model is capable of predicting the target field from the source field and vice versa, and the predicted maps exhibit statistical properties which are consistent with those of the target maps. This is indicated by the fact that the mean and standard deviation of the PDF of maps from the test set are in good agreement with those of the generated maps. The mean and variance of $P(k)$ of the real maps agree well with those of fake ones. The consistency tests on the model suggest that the source field can be recovered reasonably well by a forward mapping (source to target) followed by a backward mapping (target to source). This is demonstrated by the agreement between the statistical properties of the source images and those of the recovered ones.

Sambatra Andrianomena · Hassan · Francisco VillaescusaNavarro 🔗 


Swarm Reinforcement Learning for Adaptive Mesh Refinement
(
Poster
)
>
link
Adaptive Mesh Refinement (AMR) is crucial for meshbased simulations, as it allows for dynamically adjusting the resolution of a mesh to trade off computational cost with the simulation accuracy. Yet, existing methods for AMR either use taskdependent heuristics, expensive error estimators, or do not scale well to larger meshes or more complex problems. In this paper, we formalize AMR as a Swarm Reinforcement Learning problem, viewing each element of a mesh as part of a collaborative system of simple and homogeneous agents. We combine this problem formulation with a novel agentwise reward function and Graph Neural Networks, allowing us to learn reliable and scalable refinement strategies on arbitrary systems of equations. We experimentally demonstrate the effectiveness of our approach in improving the accuracy and efficiency of complex simulations. Our results show that we outperform learned baselines and achieve a refinement quality that is on par with a traditional errorbased AMR refinement strategy without requiring error indicators during inference. 
Niklas Freymuth · Philipp Dahlinger · Tobias Würth · Luise Kärger · Gerhard Neumann 🔗 


Stability of implicit neural networks for longterm forecasting in dynamical systems
(
Poster
)
>
link
Forecasting physical signals in long time range is among the most challenging tasks in Partial Differential Equations (PDEs) research. To circumvent limitations of traditional solvers, many different Deep Learning methods have been proposed. They are all based on autoregressive methods and exhibit stability issues. Drawing inspiration from the stability property of implicit numerical schemes, we introduce a stable autoregressive implicit neural network. We develop a theory based on the stability definition of schemes to ensure the stability in forecasting of this network. It leads us to introduce hard constraints on its weights and propagate the dynamics in the latent space. Our experimental results validate our stability property, and show improved results at longterm forecasting for two transports PDEs. 
Léon Migus · Julien Salomon · patrick gallinari 🔗 


Practical implications of equivariant and invariant graph neural networks for fluid flow modeling
(
Poster
)
>
link
Graph neural networks (GNNs) have shown promise in learning unstructured meshbased simulations of physical systems, including fluid dynamics. In tandem, geometric deep learning principles have informed the development of equivariant architectures. However, the practical implications of rotational equivariance in modeling fluids remains underexplored. We build a multiscale equivariant GNN to forecast buoyancydriven shear fluid flow and study the effect of modeling invariant and noninvariant representations of the flow state. Our results show that modeling invariant quantities produces more accurate longterm predictions and that these invariant quantities may be learned from the velocity field using a datadriven encoder. 
Varun Shankar · Shivam Barwey · Romit Maulik · Venkat Viswanathan 🔗 


Modelbased Unknown Input Estimation via Partially Observable Markov Decision Processes
(
Poster
)
>
link
In the context of condition monitoring for structures and industrial assets, the estimation of unknown inputs, usually referring to acting loads, is of salient importance for guaranteeing safe and performant engineered systems. In this work, we propose a novel method for estimating unknown inputs from measured outputs, for the case of systems with a known or learned model of the underlying dynamics. The objective is to infer those system inputs that will reproduce the actual measured outputs; this can be reformulated as a Partially Observable Markov Decision Process (POMDP) problem and solved with wellestablished planning algorithms for POMDPs. The crossentropy method (CEM) is adopted in this paper for solving the POMDP due to its efficiency and robustness. The proposed method is demonstrated using simulated dynamical systems for structures with known dynamics, as well as a real wind turbine with learned dynamics inferred through Neural Extended Kalman Filters (Neural EKF); a deep learningbased method for learning stochastic dynamics, previously proposed by the authors. 
Wei Liu · Zhilu Lai · Charikleia Stoura · Kiran Bacsa · Eleni Chatzi 🔗 


Probing optimisation in physicsinformed neural networks
(
Poster
)
>
link
A novel comparison is presented of the effect of optimiser choice on the accuracy of physicsinformed neural networks (PINNs). To give insight into why some optimisers are better, a new approach is proposed that tracks the training trajectory curvature and can be evaluated on the fly at a low computational cost. The linear advection equation is studied for several advective velocities, and we show that the optimiser choice substantially impacts PINNs model performance and accuracy. Furthermore, using the curvature measure, we found a negative correlation between the convergence error and the curvature in the optimiser local reference frame. It is concluded that, in this case, larger local curvature values result in better solutions. Consequently, optimisation of PINNs is made more difficult as minima are in highly curved regions. 
Nayara Fonseca · Veronica Guidetti · Will Trojak 🔗 


How Deep Convolutional Neural Networks lose Spatial Information with training
(
Poster
)
>
link
A central question of machine learning is how deep nets learn tasks in high di mensions. An appealing hypothesis is that they build a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural rep resentation becomes less and less sensitive to diffeomorphisms acting on images as the signal propagates through the net. This loss of sensitivity correlates with performance and surprisingly correlates with a gain of sensitivity to white noise acquired over training. These facts are unexplained, and as we demonstrate still hold when white noise is added to the images of the training set. Here we (i) show empirically for various architectures that stability to diffeomorphisms is achieved due to a combination of spatial and channel pooling; (ii) introduce a model scale detection task which reproduces our empirical observations on spatial pooling; (iii) compute analytically how the sensitivity to diffeomorphisms and noise scale with depth due to spatial pooling. In particular, we find that both trends are caused by a diffusive spreading of the neuron’s receptive fields through the layers. 
Umberto Tomasini · Leonardo Petrini · Francesco Cagnetta · Matthieu Wyart 🔗 


OPERATOR LEARNING ON FREEFORM GEOMETRIES
(
Poster
)
>
link
Operator Learning models usually rely on a fixed sampling scheme for training which might limit their ability to generalize to new situations. We present CORAL, a new method which leverages CoordinateBased Networks for OpeRAtor Learning without any constraints on the training mesh or input sampling. CORAL is able to solve complex Initial Value Problems such as 2D NavierStokes or 3D spherical ShallowWater and can perform zeroshot superresolution to recover a dense grid, even when the training grid is irregular and sparse. It can also be applied to the task of geometric design with structured or pointcloud data, to infer the steady physical state of a system given the characteristics of the domain. 
Louis Serrano · JeanNoël Vittaut · patrick gallinari 🔗 


Neural Integral Functionals
(
Poster
)
>
link
Functionals map input functions to output scalars, which are ubiquitous in various scientific fields. In this work, we propose neural integral functional (NIF), which is a general functional approximator that suits a large number of scientific problems including the brachistochrone curve problem in classical physics and density functional theory in quantum physics. One key ingredient that enables NIF on these problems is the functional’s explicit dependence on the derivative of the input function. We demonstrate that this is crucial for NIF to outperform neural operators (NOs) despite the fact that NOs are theoretically universal. With NIF, we further propose to jointly train the functional and its functional derivation (FD) to improve generalization and to enable applications that require accurate FD. We validate these claims with experiments on functional fitting and functional minimization. 
Zheyuan Hu · Tianbo Li · Zekun Shi · Kunhao Zheng · Giovanni Vignale · Kenji Kawaguchi · shuicheng YAN · Min Lin 🔗 


Projections of Model Spaces for Latent Graph Inference
(
Poster
)
>
link
Graph Neural Networks leverage the connectivity structure of graphs as an inductive bias. Latent graph inference focuses on learning an adequate graph structure to diffuse information on. In this work we employ stereographic projections of the hyperbolic and spherical model spaces, as well as products of Riemannian manifolds, for the purpose of latent graph inference. Stereographically projected model spaces achieve comparable performance to their nonprojected counterparts, while providing theoretical guarantees that avoid divergence of the spaces when the curvature tends to zero. 
Haitz Sáez de Ocáriz Borde · Alvaro Arroyo · Ingmar Posner 🔗 


Quantum Feature Maps for Graph Machine Learning on a Neutral Atom Quantum Processor
(
Poster
)
>
link
Using a quantum processor to embed and process classical data enables the generation of correlations between variables that are inefficient to represent through classical computation. A fundamental question is whether these correlations could be harnessed to enhance learning performances on real datasets. Here, we report the use of a neutral atom quantum processor comprising up to $32$ qubits to implement machine learning tasks on graphstructured data. To that end, we introduce a quantum feature map to encode the information about graphs in the parameters of a tunable Hamiltonian acting on an array of qubits. Using this tool, we first show that interactions in the quantum system can be used to distinguish nonisomorphic graphs that are locally equivalent. We then realize a toxicity screening experiment, consisting of a binary classification protocol on a biochemistry dataset comprising $286$ molecules of sizes ranging from $2$ to $32$ nodes, and obtain results which are comparable to the implementation of the best classical kernels on the same dataset. Using techniques to compare the geometry of the feature spaces associated with kernel methods, we then show evidence that the quantum feature map perceives data in an original way, which is hard to replicate using classical kernels.

Boris Albrecht · Constantin Dalyac · Lucas Leclerc · Luis OrtizGutiérrez · Slimane Thabet · Mauro D'Arcangelo · Vincent Elfving · Lucas Lassablière · Henrique Silvério · Bruno Ximenez · LouisPaul Henry · Adrien Signoles · Loic Henriet



MultiScale Message Passing Neural PDE Solvers
(
Poster
)
>
link
We propose a novel multiscale message passing neural network algorithm for learning the solutions of timedependent PDEs. Our algorithm possesses both temporal and spatial multiscale resolution features by incorporating multiscale sequence models and graph gating modules in the encoder and processor, respectively. Benchmark numerical experiments are presented to demonstrate that the proposed algorithm outperforms baselines, particularly on a PDE with a range of spatial and temporal scales. 
Léonard Equer · T. Konstantin Rusch · Siddhartha Mishra 🔗 


Noise Injection as a Probe of Deep Learning Dynamics
(
Poster
)
>
link
We propose a new method to probe the learning mechanism of Deep Neural Networks (DNN) by perturbing the system using Noise Injection Nodes (NINs). These nodes inject uncorrelated noise via additional optimizable weights to existing feedforward network architectures, without changing the optimization algorithm. We find that the system displays distinct phases during training, dictated by the scale of injected noise. We first derive expressions for the dynamics of the network and utilize a simple linear model as a test case. We find that in some cases, the evolution of the noise nodes is similar to that of the unperturbed loss, thus indicating the possibility of using NINs to learn more about the full system in the future. 
Noam Levi · Itay Bloch · Marat Freytsis · Tomer Volansky 🔗 


Discovering drag reduction strategies in wallbounded turbulent flows using deep reinforcement learning
(
Poster
)
>
link
The control of turbulent fluid flows represents a problem in several engineering applications. The chaotic, highdimensional, nonlinear nature of turbulence hinders the possibility to design robust and effective control strategies. In this work, we apply deep reinforcement learning to a threedimensional turbulent openchannel flow, a canonical flow example that is often used as a study case in turbulence, aiming to reduce the friction drag in the flow. By casting the fluiddynamics problem as a multiagent reinforcementlearning environment and by training the agents using a locationinvariant deep deterministic policy gradient algorithm, we are able to obtain a control strategy that achieves a remarkable 30\% drag reduction, improving over previously known strategies by about 10 percentage points. 
Luca Guastoni · Jean Rabault · Philipp Schlatter · Ricardo Vinuesa · Hossein Azizpour 🔗 


A Machine Learning Approach to Generate Quantum Light
(
Poster
)
>
link
Spontaneous parametric downconversion (SPDC) is a key technique in quantum optics used to generate entangled photon pairs. However, generating a desirable Ddimensional qudit state in the SPDC process remains a challenge. In this paper, we introduce a physicallyconstrained and differentiable model to overcome this challenge, and demonstrate its effectiveness through the design of shaped pump beams and structured nonlinear photonic crystals. We avoid any restrictions induced by the stochastic nature of our physical process and integrate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. Our model is capable of learning the relevant interaction parameters and designing nonlinear quantum optical systems that achieve desired quantum states. We show, theoretically and experimentally, how to generate maximally entangled states in the spatial degree of freedom. Additionally, we demonstrate alloptical coherent control of the generated state by reshaping the pump beam. Our work has potential applications in highdimensional quantum key distribution and quantum information processing. 
Eyal Rozenberg · Aviv Karnieli · Ofir Yesharim · Joshua FoleyComer · Sivan TrajtenbergMills · Sarika Mishra · Shashi Prabhakar · Ravindra Singh · Daniel Freedman · Alex Bronstein · Ady Arie



THE RL PERCEPTRON: DYNAMICS OF POLICY LEARNING IN HIGH DIMENSIONS
(
Poster
)
>
link
Reinforcement learning (RL) algorithms have proven transformative in a range ofdomains. To tackle realworld domains, these systems often use neural networksto learn policies directly from pixels or other highdimensional sensory input. Bycontrast, much theory of RL has focused on discrete state spaces or worst caseanalyses, and fundamental questions remain about the dynamics of policy learningin high dimensional settings. Here we propose a simple highdimensional modelof RL and derive its typical dynamics as a set of closedform ODEs. We show thatthe model exhibits rich behavior including delayed learning under sparse rewards;a speedaccuracy tradeoff depending on reward stringency; and a dependenceof learning regime on reward baselines. These results offer a first step towardunderstanding policy gradient methods in high dimensional settings. 
Nishil Patel · Sebastian Lee · Stefano Mannelli · Sebastian Goldt · Andrew Saxe 🔗 


Towards an inductive bias for quantum statistics in GANs
(
Poster
)
>
link
Machine learning models that leverage a latent space with a structure similar to the underlying data distribution have been shown to be highly successful. However, when the data is produced by a quantum process, classical computers are expected to struggle to generate a matching latent space. Here, we show that using a quantum processor to produce the latent space used by a generator in a generative adversarial network (GAN) leads to improved performance on a smallscale quantum dataset. We also demonstrate that this approach is scalable to largescale data. These results constitute a promising first step towards building realworld generative models with an inductive bias for data with quantum statistics. 
Hugo Wallner · William Clements 🔗 


Stationary Deep Reinforcement Learning with Quantum Kspin Hamiltonian Regularization
(
Poster
)
>
link
Instability is a major issue of deep reinforcement learning (DRL) algorithms  high variance of performance over multiple runs. It is mainly caused by the existence of many local minima and worsened by the multiple fixed points issue of Bellman's equation. As a fix, we propose a quantum Kspin Hamiltonian regularization term (called Hterm) to help a policy network converge to a highquality local minimum. First, we take a quantum perspective by modeling a policy as a Kspin Ising model and employ a Hamiltonian to measure the energy of a policy. Then, we derive a novel Hamiltonian policy gradient theorem and design a generic actorcritic algorithm that utilizes the Hterm to regularize the policy network. Finally, the proposed method reduces the variance of cumulative rewards by 65.2% ~ 85.6% on six MuJoCo tasks, compared with existing algorithms over 20 runs. 
XiaoYang Liu · Zechu Li · Shixun Wu · Xiaodong Wang 🔗 


PDEBENCH: AN EXTENSIVE BENCHMARK FOR SCI ENTIFIC MACHINE LEARNING
(
Poster
)
>
link
Despite some impressive progress in machine learningbased modeling of physical systems, there is still a lack of benchmarks for Scientific ML that are easy to use yet challenging and representative of a wide range of problems. We introduce PDEBENCH, a benchmark suite of timedependent simulation tasks based on Partial Differential Equations (PDEs). PDEBENCH comprises both code anddata to benchmark the performance of novel machine learning models against classical numerical simulations and ML baselines. Our proposed set of benchmark problems contribute the following features: (1) A much wider range of PDEs compared to existing benchmarks, ranging from relatively common examples to more realistic problems; (2) much larger readytouse datasets compared to prior work, comprising multiple simulation runs across a large number of initial and boundary conditions and PDE parameters; (3) more extensible source codes with userfriendly APIs for data generation and obtaining baselines of popular machine learning models (FNO, UNet, PINN, GradientBased Inverse Method).PDEBENCH allows users to extend the benchmark freely for their own purposes using a standardized API and to compare the performance of new models to existing baseline methods. We also propose new evaluation metrics in order to provide a more holistic understanding of model performance in the context of Scientific ML. 
Makoto Takamoto · Timothy Praditia · Raphael Leiteritz · Dan MacKinlay · Francesco Alesiani · Dirk Pflüger · Mathias Niepert 🔗 


Neuralprior stochastic block model
(
Poster
)
>
link
The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modelled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by recent surge of works in signal processing using deep neural networks as priors, we propose to model the communities as being determined by the node attributes rather than the opposite. We define the corresponding model; that we call the neuralprior SBM. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. We argue it achieves Bayesoptimal performance for the considered setting. The proposed model and algorithm can hence be used as a benchmark for both theory and algorithms. To illustrate this, we compare the optimal performances to the performance of a simple graph convolution network. 
O. Duranthon · Lenka Zdeborova 🔗 


Convolutional Neural Operators
(
Poster
)
>
link
Although very successfully used in machine learning, convolution based neural network architectures  believed to be inconsistent in function space  have been largely ignored in the context of learning solution operators of PDEs. Here, we adapt convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. The resulting architecture, termed as convolutional neural operators (CNOs), is shown to significantly outperform competing models on benchmark experiments, paving the way for the design of an alternative robust and accurate framework for learning operators. 
Bogdan Raonic · Roberto Molinaro · Tobias Rohner · Siddhartha Mishra · Emmanuel de Bézenac 🔗 


Neural Networks Learn Representation Theory: Reverse Engineering how Networks Perform Group Operations
(
Poster
)
>
link
We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory, through learning several irreducible representations of the group and converting group composition to matrix multiplication. We show small networks consistently learn this algorithm when trained on composition of group elements by reverse engineering model logits and weights, and confirm our understanding using ablations. We use this as an algorithmic test bed for the hypothesis of universality in mechanistic interpretability  that different models learn similar features and circuits when trained on similar tasks. By studying networks trained on various groups and architectures, we find mixed evidence for universality: using our algorithm, we can completely characterize the family of circuits and features that networks learn on this task, but for a given network the precise circuits learned  as well as the order they develop  are arbitrary. 
Bilal Chughtai · Lawrence Chan · Neel Nanda 🔗 


SemiEquivariant Conditional Normalizing Flows
(
Poster
)
>
link
We study the problem of learning conditional distributions of the form $p(G  \hat{G})$, where $G$ and $\hat{G}$ are two 3D graphs, using continuous normalizing flows. We derive a semiequivariance condition on the flow which ensures that conditional invariance to rigid motions holds. We demonstrate the effectiveness of the technique in the molecular setting of receptoraware ligand generation.

Eyal Rozenberg · Daniel Freedman 🔗 


Invariant preservation in machine learned PDE solvers via error correction
(
Poster
)
>
link
Machine learned partial differential equation (PDE) solvers trade the reliability of standard numerical methods for potential gains in accuracy and/or speed. The only way for a solver to guarantee that it outputs the exact solution is to use a convergent method in the limit that the grid spacing $\Delta x$ and timestep $\Delta t$ approach zero. Machine learned solvers, which learn to update the solution at large $\Delta x$ and/or $\Delta t$, can never guarantee perfect accuracy. Some amount of error is inevitable, so the question becomes: how do we constrain machine learned solvers to give us the sorts of errors that we are willing to tolerate? In this abridged version of a fulllength paper, we design more reliable machine learned PDE solvers by preserving discrete analogues of the continuous invariants of the underlying PDE. Examples of such invariants include conservation of mass, conservation of energy, the second law of thermodynamics, and/or nonnegative density. Our key insight is simple: to preserve invariants, at each timestep apply an errorcorrecting algorithm to the update rule. Though this strategy is different from how standard solvers preserve invariants, it is necessary to retain the flexibility that allows machine learned solvers to be accurate at large $\Delta x$ and/or $\Delta t$. This strategy can be applied to any autoregressive solver for any timedependent PDE in arbitrary geometries with arbitrary boundary conditions. Although this strategy is very general, the specific errorcorrecting algorithms need to be tailored to the invariants of the underlying equations as well as to the solution representation and timestepping scheme of the solver. The errorcorrecting algorithms we introduce have two key properties. First, by preserving the right invariants they guarantee numerical stability. Second, in closed or periodic systems they do so without degrading the accuracy of an alreadyaccurate solver.

Nick McGreivy · Ammar Hakim 🔗 


Denoising Diffusion Probabilistic Models to Predict the Number Density of Molecular Clouds in Astronomy
(
Poster
)
>
link
Denoising Diffusion Probabilistic Models (DDPMs) have become the mainstream generative approach in the Machine Learning and Computer Vision area, achieving stateoftheart performance in synthesizing highquality images, videos, and audio. In this work, we bring the DDPMs out of the data generation tasks, but to a new scientific application field in astronomy for inferring the volume or number density of giant molecular clouds (GMCs) from projected mass surface density maps. Specifically, we adopt magnetohydrodynamic (MHD) simulations with different global magnetic field strengths and largescale dynamics, i.e., noncolliding and colliding GMCs. We train a DDPM on both mass surface density maps and their corresponding massweighted number density maps from different viewing angles for all the simulations. We compare our performance with a more traditional empirical twocomponent and threecomponent powerlaw fitting method and with a more traditional neural network machine learning approach (CASItD). Experiments show that DDPMs achieve an order of magnitude improvement in the accuracy of predicting number density compared to that by other methods, demonstrating the promising potential of applying DDPMs in astrophysics. 
Duo Xu · Jonathan Tan · ChiaJung Hsu · Ye Zhu 🔗 


PDExplain: Contextual Modeling of PDEs in the Wild
(
Poster
)
>
link
We propose an explainable method for solving Partial Differential Equations by using a contextual scheme called PDExplain. During the training phase, our method is fed with data collected from an operatordefined family of PDEs accompanied by the general form of this family. In the inference phase, a minimal sample collected from a phenomenon is provided, where the sample is related to the PDE family but not necessarily to the set of specific PDEs seen in the training phase. We show how our algorithm can predict the PDE solution for future timesteps. Moreover, our method provides an explainable form of the PDE, a trait that can assist in modelling phenomena based on data in physical sciences. To verify our method, we conduct extensive experimentation, examining its quality both in terms of prediction error and explainability. 
Ori Linial · Orly Avner · Dotan Di Castro 🔗 


Learning Physical Models that Can Respect Conservation Laws
(
Poster
)
>
link
Recent work in scientific machine learning (SciML) has focused on incorporating partial differential equation (PDE) information into the learning process. Most of this work has focused on relatively "easy'' PDE operators (e.g., elliptic and parabolic), with less emphasis on relatively ``hard'' PDE operators (e.g., hyperbolic). Within numerical PDEs, the latter need to maintain a type of volume element or conservation constraint for a desired physical quantity, which is known to be challenging. Delivering on the promise of SciML requires seamlessly incorporating both types of problems into the learning process. To address this issue, we propose ProbConserv, a framework for incorporating constraints into a blackbox probabilistic deeplearning architecture. To do so, ProbConserv combines the integral form of a conservation law with a Bayesian update. We demonstrate the effectiveness of ProbConserv via a case study of the Generalized Porous Medium Equation (GPME), a parameterized family of equations that includes both easier and harder PDEs. On the challenging Stefan variant of the GPME, we show that ProbConserv seamlessly enforces physical conservation constraints, maintains probabilistic uncertainty quantification (UQ), and deals well with shocks and heteroscedasticity. In addition, it achieves superior predictive performance on downstream tasks. 
Derek Hansen · Danielle Maddix · Shima Alizadeh · Gaurav Gupta · Michael W Mahoney 🔗 


Physicsconstrained neural differential equations for learning multiionic transport
(
Poster
)
>
link
Continuum models for ion transport through polyamide nanopores require solving partial differential equations (PDEs) through complex pore geometries. Resolving spatiotemporal features at this length and timescale can make solving these equations computationally intractable. In addition, mechanistic models frequently require functional relationships between ion interaction parameters under nanoconfinement, which are often too challenging to measure experimentally or know a priori. In this work, we develop the first physicsinformed deep learning model to learn ion transport behaviour across polyamide nanopores. The proposed architecture leverages neural differential equations in conjunction with classical closure models as inductive biases directly encoded into the neural framework. The neural differential equations are pretrained on simulated data from continuum models and finetuned on independent experimental data to learn ion rejection behaviour. Gaussian noise augmentations from experimental uncertainty estimates are also introduced into the measured data to improve model generalization. Our approach is compared to other physicsinformed deep learning models and shows strong agreement with experimental measurements across all studied datasets. 
Danyal Rehman · John Lienhard 🔗 


Nonequispaced Fourier Neural Solvers for PDEs
(
Poster
)
>
link
Recently proposed neural resolutioninvariant models, despite their effectiveness and efficiency, usually require equispaced spatial points of data for solving partial differential equations. However, sampling in spatial domain is sometimes inevitably nonequispaced in realworld systems, limiting their applicability. In this paper, we propose a Nonequispaced Fourier PDE Solver (\textsc{NFS}) with adaptive interpolation on resampled equispaced points and a variant of Fourier Neural Operators as its components. Experimental results on complex PDEs demonstrate its advantages in accuracy and efficiency. Compared with the spatiallyequispaced benchmark methods, it achieves superior performance with $42.85\%$ improvements on MAE, and is able to handle nonequispaced data with a tiny loss of accuracy. Besides, \textsc{NFS} as a model with mesh invariant inference ability, can successfully model turbulent flows in nonequispaced scenarios, with a minor deviation of the error on unseen spatial points.

Haitao Lin · Lirong Wu · Yongjie Xu · Yufei Huang · Siyuan Li · Guojiang Zhao · Stan Z Li 🔗 


Learning Deformation Trajectories of Boltzmann Densities
(
Poster
)
>
link
We introduce a training objective for continuous normalizing flows that can be used in the absence of samples but in the presence of an energy function. Our method relies on either a prescribed or a learnt interpolation $f_t$ of energy functions between the target energy $f_1$ and the energy function of a generalized Gaussian $f_0(x) = x/\sigma_p^p$. The interpolation of energy functions induces an interpolation of Boltzmann densities $p_t \propto e^{f_t}$ and we aim to find a timedependent vector field $V_t$ that transports samples along the family $p_t$ of densities. The condition of transporting samples along the family $p_t$ can be translated to a PDE between $V_t$ and $f_t$ and we optimize $V_t$ and $f_t$ to satisfy this PDE.

Bálint Máté · François Fleuret 🔗 


PHYSICSINSPIRED INTERPRETABILITY OF MACHINE LEARNING MODELS
(
Poster
)
>
link
The ability to explain decisions made by machine learning models remains one of the most significant hurdles towards widespread adoption of AI in highly sensitive areas such as medicine, cybersecurity or autonomous driving. Great interest exists in understanding which features of the input data prompt model decision making. In this contribution, we propose a novel approach to identify relevant features of the input data, inspired by methods from the energy landscapes field, developed in the physical sciences. By identifying conserved weights within groups of minima of the loss function landscapes, we can identify the drivers of model decision making. Analogues to this idea exist in the molecular sciences, where coordinate invariants or order parameters are employed to identify critical features of a molecule. However, no such approach exists for machine learning loss function landscapes. We will demonstrate the applicability of energy landscape methods to machine learning models and give examples, both synthetic and from the real world, for how these methods can help to make models more interpretable. 
Maximilian Niroomand · David Wales 🔗 