Poster
in
Workshop: Workshop on AI for Children: Healthcare, Psychology, Education

Prediction of Item Difficulty for Reading Comprehension Items by Creation of Annotated Item Repository

Radhika Kapoor · Sang Truong · Nick Haber · Benjamin Domingue · Maria Araceli Ruiz-Primo

Keywords: Item difficulty modeling Reading comprehension Assessment

Project Page [ OpenReview]

Abstract

Can the difficulty of an item measuring reading comprehension be predicted from its content? To address this question, we model item difficulty using a repository of reading passages and student data from US standardized assessments from NY and Texas for grades 3-8. This repository is annotated with meta-data on (1) linguistic features of the reading items, (2) assessment characteristics of the passage (such as availability of reading aids and vocabulary definitions), and (3) context information, such as respondent's state, grade, and year. We supplement these human interpretable characteristics with embeddings from LLMs (ModernBERT, BERT, and LlAMA) to predict item difficulty. We find that a simple, penalized regression prediction model with all features can predict item difficulty with RMSE 0.52 and a correlation of 0.75 between true and predicted difficulty. LLM embeddings marginally improve item difficulty prediction compared to human interpretable feature prediction. When models use only item linguistic features or only LLM embeddings, performance is similar, suggesting that only one of these feature classes could be used. The prediction model can be used to support the automatic generation of reading items. It will be made publicly available for use by other education stakeholders.

Chat is not available.