Skip to yearly menu bar Skip to main content


Poster

RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression

Hengzhe Zhang · Qi Chen · Bing XUE · Wolfgang Banzhaf · Mengjie Zhang

Hall 3 + Hall 2B #330
[ ]
Sat 26 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Symbolic regression is a key task in machine learning, aiming to discover mathematical expressions that best describe a dataset. While deep learning has increased interest in using neural networks for symbolic regression, many existing approaches rely on pre-trained models. These models require significant computational resources and struggle with regression tasks involving unseen functions and variables. A pre-training-free paradigm is needed to better integrate with search-based symbolic regression algorithms. To address these limitations, we propose a novel framework for symbolic regression that integrates evolutionary feature construction with a neural network, without the need for pre-training. Our approach adaptively generates symbolic trees that align with the desired semantics in real-time using a language model trained via online supervised learning, providing effective building blocks for feature construction. To mitigate hallucinations from the language model, we design a retrieval-augmented generation mechanism that explicitly leverages searched symbolic expressions. Additionally, we introduce a scale-invariant data augmentation technique that further improves the robustness and generalization of the model. Experimental results demonstrate that our framework achieves state-of-the-art accuracy across 25 regression algorithms and 120 regression tasks.

Live content is unavailable. Log in and register to view live content