Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Abstract
Pretrained transformers readily adapt to new sequence modeling tasks via zero-shot prompting, but relational domains still lack architectures that transfer across datasets and tasks. The core challenge is the diversity of relational data, with varying heterogeneous schemas, graph structures, and functional dependencies. We propose the Relational Transformer (RT), a cell-level architecture pretrained on diverse relational databases and directly applicable to unseen datasets and tasks, without any need for task- or dataset-specific fine-tuning or retrieval of in-context examples. RT (i) tokenizes cells with table/column metadata, (ii) is pretrained via masked token prediction, and (iii) utilizes a novel Relational Attention mechanism over columns, rows, and primary–foreign key links. Pretrained on RelBench datasets spanning tasks such as churn and sales forecasting, RT attains strong zero-shot performance; on binary classification it averages 94\% of fully supervised AUROC in a single forward pass, and fine-tuning yields state-of-the-art results with high sample efficiency. Our experiments show that RT’s zero-shot transfer harnesses task-table context, column and feature attention, and schema semantics. Overall, RT provides a practical path toward foundation models for relational data.