All in the Head?: A Controlled Study of Component Contributions in Few-Shot NLP
Abstract
Few-shot text classification is often studied through model scaling or full fine-tuning, but less is known about how classification head design influences performance when representations are held fixed. This work examines that question under a controlled frozen-encoder setting, where a compact LSTM-based head is trained on top of contextual embeddings while all encoder parameters remain unchanged. We evaluate the effects of three design choices, recurrence, attention, and targeted synonym-based augmentation, across multiple few-shot benchmarks using a consistent protocol. Our experiments show that each component contributes measurable gains under tight data constraints, and that a small recurrent head can recover strong accuracy with only a few million trainable parameters. We report consistent improvements over simpler head configurations and competitive performance relative to compact transformer-based alternatives under identical conditions, while maintaining a low optimization footprint. These results provide evidence that head architecture and training choices remain consequential even with fixed contextual encoders, and highlight a simple controlled framework for studying inductive biases in low-shot classification systems.