Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Dichotomy in Compositional Reasoning: Scaling and Limitations of LLMs in Composite Task
Zhuoyan Xu · Zhenmei Shi · Yingyu Liang
Large language models (LLM) have emerged as a powerful tool for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Despite the tremendous success of LLMs, how they tackle composite tasks, especially those not encountered during the pretraining phase, remains an open question and largely lacks understanding. In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples. We observe a dichotomy: (1) for certain composite tasks, the models demonstrate decent compositional ability, while scaling up the model enhances this ability; (2) for more complex composite tasks, the models fail, and scaling up typically offers no improvement. We explain this phenomenon in a simplified setting with theoretical analysis.To validate our findings, we develop a test suite comprising composite tasks that include both linguistic and logical challenges. We substantiate our observation with empirical studies across different LLM families.