LaPep: Can Language Contribute to Property-Guided Peptide Design?
Abstract
Large language models (LLMs) encode broad chemical heuristics from the scientific literature and are increasingly proposed as tools for therapeutic molecule design. However, their effectiveness for generating therapeutically viable peptides, particularly in the absence of strong labeled predictors, remains unclear. We introduce LaPep, a sampling-time framework that integrates LLMs as token-level proposers within a discrete flow-based peptide generator, while using hard property predictors to guide and evaluate generation. Using open-source LLMs including Qwen3, Kimi K2, and Llama 3, we study two representative design settings: permeability, where a strong predictor exists, and protease stability, where it does not. We show that language guidance can improve permeability when combined with a hard predictor, but provides limited or inconsistent gains for protease stability when used alone, despite leveraging external heuristic scorers. These results highlight that current LLMs are not yet reliable substitutes for quantitative property models in therapeutic peptide design. We position LaPep as a strong diagnostic framework for systematically evaluating the capabilities and limitations of language models in guided molecular generation, and argue that high-quality labeled predictors remain critical for translating language-driven design into therapeutically relevant outcomes.