OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Abstract
Prostate cancer is one of the most common and lethal cancers among men, making its early detection critically important. Ultrasound computed tomography (USCT) has emerged as an accessible and cost-effective method that reconstructs quantitative tissue parameters, which can serve as potential biomarkers for malignancy. However, current prostate USCT faces considerable barriers: limited-angle acquisitions due to anatomical constraints, tissue heterogeneity, proximity to organs and bony pelvic structures, and lengthy processing times. The lack of large-scale, anatomically precise datasets significantly hampers the development of high-quality, efficient, and generalizable methods. To address this gap, we introduce OpenPros, the first large-scale benchmark dataset for limited-angle prostate USCT, designed to evaluate machine learning algorithms for inverse problems systematically. Our dataset includes over 280,000 paired samples of realistic 2D speed-of-sound (SOS) phantoms and corresponding ultrasound full-waveform data, generated from anatomically accurate 3D digital prostate models derived from 4 real clinical MRI/CT scans and 62 ex vivo prostate specimens with experimental ultrasound measurements, annotated by medical experts. Simulations are conducted under clinically realistic configurations using advanced finite-difference time-domain (FDTD) and Runge-Kutta acoustic wave solvers, both provided as open-source components. Through comprehensive benchmarking, we find that deep learning methods significantly outperform traditional physics-based algorithms in inference efficiency and reconstruction accuracy. However, our results also reveal that current machine learning methods fail to deliver clinically acceptable, high-resolution reconstructions, underscoring critical gaps in generalization, robustness, and uncertainty quantification. By publicly releasing OpenPros, we provide the community with a rigorous benchmark that not only enables fair method comparison but also motivates new advances in physics-informed learning, foundation models for scientific imaging, and uncertainty-aware reconstruction—bridging the gap between academic ML research and real-world clinical deployment. The dataset is publicly accessible at https://open-pros.github.io/.