RankFlow: Property-aware Transport for Protein Optimization
Abstract
A key step in protein optimization is to accurately model the fitness landscape, which maps sequence and structure to functional assay readouts. Previous methods typically predict fitness landscape by directly using likelihoods or embeddings derived from pretrained protein language models (PLMs), which are property-agnostic. In addition, many predictors assume individual mutations have independent effects, thus failing to capture rich interactions among multiple mutations. In this work, we introduce RankFlow, a conditional flow framework that refines PLM representations to be a property-aligned distribution via a tailored energy function. RankFlow captures multi-mutation interactions through learnable embeddings on mutation sets. To align optimization with evaluation protocols, we propose the Rank-Consistent Conditional Flow Loss, a differentiable ranking objective that enforces the correct order of mutants rather than absolute values, which improves out-of-distribution generalization. Finally, we introduce a Property-guided Steering Gate (PSG) that concentrates learning on positions carrying signal for the target property while suppressing unrelated evolutionary biases. Across ProteinGym, PEER, and FLIP benchmarks, RankFlow attains state-of-the-art ranking accuracy and stronger generalization to higher-order mutants.