OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
Abstract
Computed Tomography (CT) is one of the most widely used and diagnostically information-dense imaging modalities, covering critical organs such as the heart, lungs, liver, and colon. Clinical interpretation relies on both \textbf{slice-driven} local features (e.g., sub-centimeter nodules, lesion boundaries) and \textbf{volume-driven} spatial representations (e.g., tumor infiltration, inter-organ anatomical relations). However, existing Large Vision–Language Models (LVLMs) remain fragmented in CT slice versus volumetric understanding: slice-driven LVLMs show strong generalization but lack cross-slice spatial consistency, while volume-driven LVLMs explicitly capture volumetric semantics but suffer from coarse granularity and poor compatibility with slice inputs. The absence of a unified modeling paradigm constitutes a major bottleneck for the clinical translation of medical LVLMs. We present \textbf{OmniCT}, a powerful unified slice–volume LVLM for CT scans, which makes three contributions: \textbf{(i) Spatial Consistency Enhancement (SCE):} volumetric slice composition combined with tri-axial positional encoding introduces volumetric consistency, and an MoE hybird projection enables efficient slice–volume adaptation; \textbf{(ii) Organ-level Semantic Enhancement (OSE):} segmentation and ROI localization explicitly align anatomical regions, emphasizing lesion- and organ-level semantics; \textbf{(iii) MedEval-CT:} the largest slice–volume CT dataset and hybrid benchmark integrates multi-level metrics for unified evaluation. OmniCT consistently outperforms existing methods with a substantial margin across diverse clinical tasks, satisfies both micro-level detail sensitivity and macro-level spatial reasoning, and establishes a new paradigm for cross-dimensional medical imaging modeling. Our project is available at \href{https://anonymous.4open.science/r/OmniCT}{link}.