Theses Doctoral

Dealing with Sparse Rater Scoring of Constructed Responses within a Framework of a Latent Class Signal Detection Model

Kim, Sunhee

In many assessment situations that use a constructed-response (CR) item, an examinee's response is evaluated by only one rater, which is called a single rater design. For example, in a classroom assessment practice, only one teacher grades each student's performance. While single rater designs are the most cost-effective method among all rater designs, the lack of a second rater causes difficulties with respect to how the scores should be used and evaluated. For example, one cannot assess rater reliability or rater effects when there is only one rater. The present study explores possible solutions for the issues that arise in sparse rater designs within the context of a latent class version of signal detection theory (LC-SDT) that has been previously used for rater scoring. This approach provides a model for rater cognition in CR scoring (DeCarlo, 2005; 2008; 2010) and offers measures of rater reliability and various rater effects. The following potential solutions to rater sparseness were examined: 1) the use of parameter restrictions to yield an identified model, 2) the use of informative priors in a Bayesian approach, and 3) the use of back readings (e.g., partially available 2nd rater observations), which are available in some large scale assessments. Simulations and analyses of real-world data are conducted to examine the performance of these approaches. Simulation results showed that using parameter constraints allows one to detect various rater effects that are of concern in practice. The Bayesian approach also gave useful results, although estimation of some of the parameters was poor and the standard deviations of the parameter posteriors were large, except when the sample size was large. Using back-reading scores gave an identified model and simulations showed that the results were generally acceptable, in terms of parameter estimation, except for small sample sizes. The paper also examines the utility of the approaches as applicable to the PIRLS USA reliability data. The results show some similarities and differences between parameter estimates obtained with posterior mode estimation and with Bayesian estimation. Sensitivity analyses revealed that rater parameter estimates are sensitive to the specification of the priors, as also found in the simulation results with smaller sample sizes.


  • thumnail for Kim_columbia_0054D_11375.pdf Kim_columbia_0054D_11375.pdf application/pdf 7.32 MB Download File

More About This Work

Academic Units
Measurement and Evaluation
Thesis Advisors
DeCarlo, Lawrence
Ph.D., Columbia University
Published Here
May 23, 2013