Poster
in
Workshop: Workshop on AI for Children: Healthcare, Psychology, Education
What About the Children? Evaluating and Mitigating Ageism in Medical QA Benchmarks
Adil Bahaj · Mohamed CHETOUANI · Mounir Ghogho
Keywords: [ pediatrics ] [ benchmark dataset ] [ question answering ]
Despite significant advancements in medical question-answering (QA) systems powered by large language models (LLMs), pediatric medicine remains underrepresented in both research and dataset development. This imbalance stems from fundamental physiological and developmental differences between children and adults, as well as a historical bias favoring adult-centric medical literature. As a result, LLMs trained on existing medical corpora may exhibit age-related biases, leading to suboptimal performance in pediatric contexts. In this work, we systematically assess the extent of pediatric underrepresentation in existing medical QA benchmarks, quantifying both the prevalence and impact of age-related biases. To address these gaps, we introduce a novel evaluation benchmark specifically curated to enhance pediatric medical representation. By incorporating diverse pediatric sources, our dataset provides a more equitable foundation for evaluating LLM performance across different age groups. Our findings highlight the critical need for age-inclusive AI-driven medical tools, aligning with broader efforts in precision medicine and equitable healthcare.