Poster
in
Workshop: 3rd Workshop on Navigating and Addressing Data Problems For Foundation Models (DATA-FM)

[Short] Few-Shot Cross-Table Data Mixture in Tabular In-Context Learning: Benefits, Failure Modes, and Alignment

Jia-Wei Liao ⋅ Kuan-Yu Chen ⋅ Yu-Chen Den ⋅ Tien-Hao Chang

Project Page [ OpenReview]

Abstract

Tabular foundation models show promise for structured data prediction, but unlike text and images, tabular datasets exhibit heterogeneous schemas and label semantics. This raises a critical question: Does mixing tables during few-shot training improve in-context learning (ICL)? We systematically investigate cross-table training under controlled few-shot protocols, comparing single-table training versus augmentation with auxiliary datasets. We identify severe negative transfer under naive mixing and propose two alignment strategies: feature-level matching via optimal transport (OT) and label semantics alignment via pseudo-labeling. Our key finding reveals an architectural divide: TabPFN-v2 and MITRA fail to benefit from cross-table augmentation, while representation-based models (TabICL) achieve +1.02% average improvement. This indicates that cross-table learning requires learned embedding spaces where semantic correspondences can be preserved across heterogeneous schemas.

Chat is not available.