PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
Abstract
Segmentation in pathology image requires handling out-of-domain tissue morphologies and new pathologies beyond training distributions, where traditional closed-set segmentation approaches fail to generalize. Reasoning segmentation enables zero-shot generalization via prompting with text queries. However, existing reasoning segmentation models face three barriers when applied to pathology: (1) the vision encoder lack pathology-specific knowledge and robustness to staining variations, (2) the large language model (LLM) backbone for reasoning fails to identify whether it has gathered sufficient semantic context to trigger the segmentation output, and (3) no reasoning segmentation benchmarks and datasets exist for pathology analysis. Consequently, we introduce PathChat-SegR1, a reasoning segmentation model built upon pathology-specific vision encoders trained with a novel stain-invariant self-distillation for robust pathology image representations. Moreover, we propose Segmentation-Optimized GRPO (SO-GRPO), a reinforcement learning method specifically for reasoning segmentation that learns to determine optimal segmentation timing based on accumulated reasoning context. Finally, we construct a pathology-specific reasoning segmentation benchmark of 118,667 triplets of pathology image, ground-truth mask, query, and reasoning chain including both public and private pathology images. Zero-shot evaluation on pathology images with out-of-domain morphologies/pathologies shows 61\% improvement over state-of-the-art segmentation models. Code is available at https://anonymous.4open.science/r/PathChat-Seg-3116.