LLATAS: Large LAnguage models as Tabular Auxiliary feature Synthesizer
Abstract
While classical models like Gradient Boosting remain state-of-the-art for tabular data, their performance is often bottlenecked by the limitations of heuristic feature engineering. To address this, we introduce LLATAS, a framework that leverages Large Language Models (LLMs) to synthesize semantic reasoning traces as auxiliary features. Grounded in the Learning Using Privileged Information (LUPI) paradigm, we use these generated signals to train a teacher model, which then guides a lightweight student model operating solely on original inputs. This distillation process allows the student to inherit complex reasoning capabilities without incurring the computational cost of LLMs at inference. Empirical evaluations on disease prediction tasks demonstrate that LLATAS significantly outperforms baselines, reducing test error rates by 17.6% for XGBoost and 22.0% for MLP models.