Academic Commons Search Results
https://academiccommons.columbia.edu/catalog?action=index&controller=catalog&f%5Bdepartment_facet%5D%5B%5D=Measurement+and+Evaluation&format=rss&fq%5B%5D=has_model_ssim%3A%22info%3Afedora%2Fldpd%3AContentAggregator%22&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usDeveloping an approach to determine generalizability: A review of efficacy and effectiveness trials funded by the Institute of Education Sciences
https://academiccommons.columbia.edu/catalog/ac:206699
Fellers, Lauren Ashley10.7916/D86D5ZN1Thu, 15 Jun 2017 16:06:42 +0000Since its establishment the Institute of Education Sciences has been creating opportunities and driving standards to generate research in education that is high quality rigorous, and relevant. This dissertation is an analysis of current practices in Goal III and Goal IV studies, in order to (1) better understand of the types of schools that agree to take part in these studies, and (2) an assess how representative these schools are in comparison to important policy relevant populations. This dissertation focuses on a subset of studies that were funded from 2005-2014 by the Department of Education, IES, under the NCER grants-funding arm. Studies included were those whose interventions were aimed at elementary students across core curriculum and ELL program areas. Study schools were compared to two main populations, the U.S population of elementary schools and Title I elementary schools, as well as these populations on a state level. The B-index, proposed by Tipton (2014) was the main value of comparison used to assess the compositional similarity, or generalizability, of study schools to these identified inference populations. The findings show that across all studies included in this analysis, participating schools were representative of the U.S. population of schools, B-index = 0.9. Comparisons were also made between this collection of schools and the respective populations at the state level. Results showed that these schools were not representative of any individual states (no B-index values were greater than 0.90). Across all included studies, schools that agreed to participate were more often located in urban areas, had higher rates of FRL students, had more minority students enrolled, and had more total students, in both district and school, than those schools in the population of U.S. schools. It is clear that the movement of education research is to be relevant to a larger audience. Through this study it is clear that, across studies, we are achieving some representation in IES funded studies. However, the finer comparisons, study samples to individual state and individual studies to these populations, show limited similarity between study schools and populations of interest to policy makers using these study findings to make decisions about their schools.Education--Research, Educational evaluation, Educational statistics, Education--Statistics, Statistics, Institute of Education Sciences (U.S.)laf2156Measurement and EvaluationThesesEstimation of Q-matrix for DINA Model Using the Constrained Generalized DINA Framework
https://academiccommons.columbia.edu/catalog/ac:198654
Li, Huacheng10.7916/D88W3DB2Thu, 15 Jun 2017 15:07:07 +0000The research of cognitive diagnostic models (CDMs) is becoming an important field of psychometrics. Instead of assigning one score, CDMs provide attribute profiles to indicate the mastering status of concepts or skills for the examinees. This would make the test result more informative. The implementation of many CDMs relies on the existing item-to-attribute relationship, which means that we need to know the concepts or skills each item requires. The relationships between the items and attributes could be summarized into the Q-matrix. Misspecification of the Q-matrix will lead to incorrect attribute profile. The Q-matrix can be designed by expert judgement, but it is possible that such practice can be subjective. There are previous researches about the Q-matrix estimation. This study proposes an estimation method for one of the most parsimonious CDMs, the DINA model. The method estimates the Q-matrix for DINA model by setting constraints on the generalized DINA model. In the simulation study, the results showed that the estimated Q-matrix fit better the empirical fraction subtraction data than the expert-design Q-matrix. We also show that the proposed method may still be applicable when the constraints were relaxed.Psychometrics, Educational tests and measurements, Statisticshl2536Measurement and EvaluationThesesPosterior Predictive Model Checks in Cognitive Diagnostic Models
https://academiccommons.columbia.edu/catalog/ac:208180
Park, Jung Yeon10.7916/D8SQ8ZV7Wed, 14 Jun 2017 19:55:52 +0000Cognitive diagnostic models (CDMs; DiBello, Roussos, & Stout, 2007) have received increasing attention in educational measurement for the purpose of diagnosing strengths and weaknesses of examinees’ latent attributes. And yet, despite the current popularity of a number of diagnostic models, research seeking to assess model-data fit has been limited. The current study applied one of the Bayesian model checking methods, namely the posterior predictive model check method (PPMC; Rubin, 1984), to its investigation of model misfit. We employed the technique in order to assess the model-data misfit from various diagnostic models, using real data and conducting two simulation studies. An important issue when it comes to the application of PPMC is choice of discrepancy measure. This study examines the performance of three discrepancy measures utilized to assess different aspects of model misfit: observed total-scores distribution, association of item pairs, and correlation between attribute pairs as adequate measures of the diagnostic models.Educational tests and measurementsjyp2111Measurement and EvaluationThesesAnalyzing Hierarchical Data with the DINA-HC Approach
https://academiccommons.columbia.edu/catalog/ac:208151
Zhang, Jianzhou10.7916/D8PG1R67Wed, 14 Jun 2017 19:55:23 +0000Cognitive Diagnostic Models (CDMs) are a class of models developed in order to diagnose the cognitive attributes of examinees. They have received increasing attention in recent years because of the need of more specific attribute and item related information. A particular cognitive diagnostic model, namely, the hierarchical deterministic, input, noisy ‘and’ gate model with convergent attribute hierarchy (DINA-HC) is proposed to handle situations when the attributes have a convergent hierarchy. Su (2013) first introduced the model as the deterministic, input, noisy ‘and’ gate with hierarchy (DINA-H) and retrofitted The Trends in International Mathematics and Science Study (TIMSS) data utilizing this model with linear and unstructured hierarchies. Leighton, Girl, and Hunka (1999) and Kuhn (2001) introduced four forms of hierarchical structures (Linear, Convergent, Divergent, and Unstructured) by assuming the interrelated competencies of the cognitive skills. Specifically, the convergent hierarchy is one of the four hierarchies (Leighton, Gierl & Hunka, 2004) and it was used to describe the attributes that have a convergent structure. One of the features of this model is that it can incorporate the hierarchical structures of the cognitive skills in the model estimation process (Su, 2013). The advantage of the DINA-HC over the Deterministic, input, noisy ‘and’ gate (DINA) model (Junker & Sijtsma, 2001) is that it will reduce the number of parameters as well as the latent classes by imposing the particular attribute hierarchy. This model follows the specification of the DINA except that it will pre-specify the attribute profiles by utilizing the convergent attribute hierarchies. Only certain possible attribute pattern will be allowed depending on the particular convergent hierarchy. Properties regarding the DINA-HC and DINA are examined and compared through the simulation and empirical study. Specifically, the attribute profile pattern classification accuracy, the model and item fit are compared between the DINA-HC and DINA under different conditions when the attributes have convergent hierarchies. This study indicates that the DINA-HC provides better model fit, less biased parameter estimates and higher attribute profile classification accuracy than the DINA when the attributes have a convergent hierarchy. The sample size, the number of attributes, and the test length have been shown to have an effect on the parameter estimates. The DINA model has better model fit than the DINA-HC when the attributes are not dependent on each other.Educational tests and measurements, Statisticsjz2369Measurement and EvaluationThesesExploring Skill Condensation Rules for Cognitive Diagnostic Models in a Bayesian Framework
https://academiccommons.columbia.edu/catalog/ac:193918
Luna Bazaldua, Diego A.10.7916/D8NP247CWed, 14 Jun 2017 19:52:28 +0000Diagnostic paradigms are becoming an alternative to normative approaches in educational assessment. One of the principal objectives of diagnostic assessment is to determine skill proficiency for tasks that demand the use of specific cognitive processes. Ideally, diagnostic assessments should include accurate information about the skills required to correctly answer each item in a test, as well as any additional evidence about the interaction between those cognitive constructs. Nevertheless, little research in the field has focused on the types of interactions (i.e., the condensation rules) among skills in models for cognitive diagnosis.
The present study introduces a Bayesian approach to determine the underlying interaction among the skills measured by a given item when comparing among models with conjunctive, disjunctive, and compensatory condensation rules. Following the reparameterization framework proposed by DeCarlo (2011), the present study includes transformations for disjunctive and compensatory models. Next, a methodology that compares between pairs of models with different condensation rules is presented; parameters in the model and their distribution were defined considering former Bayesian approaches proposed in the literature.
Simulation studies and empirical studies were performed to test the capacity of the model to correctly identify the underlying condensation rule. Overall, results from the simulation study showed that the correct condensation rule is correctly identified across conditions. The results showed that the correct condensation rule identification depends on the item parameter values used to generate the data and the use of informative prior distributions for the model parameters. Latent class sizes parameters for the skills and their respective hyperparameters also showed a good recovery in the simulation study. The recovery of the item parameters presented limitations, so some guidelines to improve their estimation are presented in the results and discussion sections.
The empirical studies highlighted the usefulness of this approach in determining the interaction among skills using real items from a mathematics test and a language test. Despite the differences in their area of knowledge and Q-matrix structure, results indicated that both tests are composed in a higher proportion of conjunctive items that demand the mastery of all skills.Educational tests and measurements, Education--Mathematical models, Educational evaluation, Psychometricsdal2159Measurement and EvaluationThesesEstimating the Q-matrix for Cognitive Diagnosis Models in a Bayesian Framework
https://academiccommons.columbia.edu/catalog/ac:176107
Chung, Meng-ta10.7916/D857195BThu, 08 Jun 2017 20:11:25 +0000This research aims to develop an MCMC algorithm for estimating the Q-matrix in a Bayesian framework. A saturated multinomial model was used to estimate correlated attributes in the DINA model and rRUM. Closed-forms of posteriors for guess and slip parameters were derived for the DINA model. The random walk Metropolis-Hastings algorithm was applied to parameter estimation in the rRUM. An algorithm for reducing potential label switching was incorporated into the estimation procedure. A method for simulating data with correlated attributes for the DINA model and rRUM was offered.
Three simulation studies were conducted to evaluate the algorithm for Bayesian estimation. Twenty simulated data sets for simulation study 1 were generated from independent attributes for the DINA model and rRUM. A hundred data sets from correlated attributes were generated for the DINA and rRUM with guess and slip parameters set to 0.2 in simulation study 2. Simulation study 3 analyzed data sets simulated from the DINA model with guess and slip parameters generated from Uniform (0.1, 0.4). Results from simulation studies showed that the Q-matrix recovery rate was satisfactory. Using the fraction-subtraction data, an empirical study was conducted for the DINA model and rRUM. The estimated Q-matrices from the two models were compared with the expert-designed Q-matrix.Psychometrics, Statistics, Educational tests and measurementsMeasurement and EvaluationThesesFactors Affecting Probability Matching Behavior
https://academiccommons.columbia.edu/catalog/ac:164326
Gao, Jie10.7916/D8SF33J6Thu, 08 Jun 2017 16:12:57 +0000In life, people commonly face repeated decisions under risk or uncertainty. While normative economic models assume that people tend to make choices that maximize their expected utility, suboptimal behavior - in particular, probability matching - is frequently observed in research on repeated decisions. Probability matching is the tendency to match prediction probabilities of each outcome with the observed outcome probabilities in a random binary prediction task. For example, when people are faced with making with a sequence of predictions, such as repeatedly predicting the outcome of rolling a die with four sides colored green and two sides colored red, most people allocate about two-thirds of their predictions to green, and one-third to red. The optimal strategy, referred to as maximizing, would be to choose the outcome with the higher probability in every trial in the prediction task. Various causes for probability matching have been proposed during the past several decades. Here it is proposed that implicit adoption of a perfect prediction goal by decision makers might tend to elicit probability matching behavior. Thus, one factor that might affect the prevalence of probability matching behavior (investigated in Studies 1 and 2) is the type of performance goal. The manipulation in Study 1 contrasted single-trial prediction with prediction of four-trial sequences, which it is hypothesized might create an implicit perfect prediction goal for the sequence. In Study 2, three levels of goal were explicitly manipulated for each sequence: a perfect prediction goal, an 80% correct goal, and a 60% correct goal. In both studies it was predicted that more matching behavior would be observed for those who have a goal of perfect prediction than those who have a more reasonable (lower) goal. The results of both studies, conducted in an online worker marketplace, supported the goal-level hypothesis. The second factor proposed to affect the prevalence of probability matching is the type of conceptual schema describing the events to be predicted: independent events or complementary events. Study 3 investigated the effects of schema type and abstraction level of context on matching or maximizing behavior. Three abstraction levels of stories were included: abstract, concrete random devices, and real-world stories. The main hypothesis was that when the two options to be predicted are independent events, less matching and more maximizing behavior should be observed. Data from Study 3 supported the hypothesis that independent events tend to elicit more maximizing behavior. No effects of abstraction level were observed.Cognitive psychology, Psychometricsjg2499Measurement and EvaluationThesesBayesian Multidimensional Scaling Model for Ordinal Preference Data
https://academiccommons.columbia.edu/catalog/ac:161114
Matlosz, Kerry McCloskey10.7916/D8DJ5NV7Thu, 08 Jun 2017 13:55:53 +0000The model within the present study incorporated Bayesian Multidimensional Scaling and Markov Chain Monte Carlo methods to represent individual preferences and threshold parameters as they relate to the influence of survey items popularity and their interrelationships. The model was used to interpret two independent data samples of ordinal consumer preference data related to purchasing behavior. The objective of the procedure was to provide an understanding and visual depiction of consumers' likelihood of having a strong affinity toward one of the survey choices, and how other survey choices relate to it. The study also aimed to derive the joint spatial representation of the subjects and products represented by the dissimilarity preference data matrix within a reduced dimensionality. This depiction would aim to enable interpretation of the preference structure underlying the data and potential demand for each product. Model simulations were created both from sampling the normal distribution, as well as incorporating Lambda values from the two data sets and were analyzed separately. Posterior checks were used to determine dimensionality, which were also confirmed within the simulation procedures. The statistical properties generated from the simulated data confirmed that the true parameter values (loadings, utilities, and latititudes) were recovered. The model effectiveness was contrasted and evaluated both within real data samples and a simulated data set. The two data sets analyzed were confirmed to have differences in their underlying preference structures that resulted in differences in the optimal dimensionality in which the data should be represented. The Biases and MSEs of the lambdas and alphas provide further understanding of the data composition and Analysis of variance (ANOVA) confirmed the differences in MSEs related to changes in dimensions were statistically significant.Statisticskmm2159Measurement and EvaluationThesesPenalized Joint Maximum Likelihood Estimation Applied to Two Parameter Logistic Item Response Models
https://academiccommons.columbia.edu/catalog/ac:161745
Paolino, Jon-Paul Noel10.7916/D88W3MHSThu, 08 Jun 2017 13:55:53 +0000Item response theory (IRT) models are a conventional tool for analyzing both small scale and large scale educational data sets, and they are also used for the development of high-stakes tests such as the Scholastic Aptitude Test (SAT) and the Graduate Record Exam (GRE). When estimating these models it is imperative that the data set includes many more examinees than items, which is a similar requirement in regression modeling where many more observations than variables are needed. If this requirement has not been met the analysis will yield meaningless results. Recently, penalized estimation methods have been developed to analyze data sets that may include more variables than observations. The main focus of this study was to apply LASSO and ridge regression penalization techniques to IRT models in order to better estimate model parameters. The results of our simulations showed that this new estimation procedure called penalized joint maximum likelihood estimation provided meaningful estimates when IRT data sets included more items than examinees when traditional Bayesian estimation and marginal maximum likelihood methods were not appropriate. However, when the IRT datasets contained more examinees than items Bayesian estimation clearly outperformed both penalized joint maximum likelihood estimation and marginal maximum likelihood.Statisticsjnp2111Measurement and EvaluationThesesAn Item Response Theory Approach to Causal Inference in the Presence of a Pre-intervention Assessment
https://academiccommons.columbia.edu/catalog/ac:188469
Marini, Jessica10.7916/D8WM1CR3Thu, 08 Jun 2017 13:55:45 +0000This research develops a form of causal inference based on Item Response Theory (IRT) to combat bias that occurs when existing causal inference methods are used under certain scenarios. When a pre-test is administered, prior to a treatment decision, bias can occur in causal inferences about the decision's effect on the outcome. This new IRT based method uses item-level information, treatment placement, and the outcome to produce estimates of each subject's ability in the chosen domain. Examining a causal inference research question in an IRT model-based framework becomes a model-based way to match subjects on estimates of their true ability. This model-based matching allows inferences to be made about a subject's performance as if they had been in the opposite treatment group. The IRT method is developed to combat existing methods' downfalls such as relying on conditional independence between pre-test scores and outcomes. Using simulation, the IRT method is compared to existing methods under two different model scenarios in terms of Type I and Type II errors. Then the method's parameter recovery is analyzed followed by accuracy of treatment effect evaluation. The IRT method is shown to out perform existing methods in an ability-based scenario. Finally, the IRT method is applied to real data assessing the impact of advanced STEM in high school on a students choice of major, and compared to existing alternative approaches.Educational tests and measurements, Statisticsjpm2120Measurement and EvaluationThesesA Bayesian Multidimensional Scaling Model for Partial Rank Preference Data
https://academiccommons.columbia.edu/catalog/ac:160395
Tanaka, Kyoko10.7916/D8XK8NSGThu, 08 Jun 2017 13:55:44 +0000There has been great advancement on research for preferential choice in field of marketing. When we look at preferential choice data, there are two components to consider: the individuals and the items. Coombs (1950; 1964) introduced the unfolding technique on preferential choice data. In 1960, Bennett and Hays went on to create a multidimensional unfolding model. Hojo (1997;1998) showed rank data could be used in multidimensional scaling, however he did not implement a Bayesian technique. In 2010, Fong, DeSarbo, Park, and Scott proposed a new Bayesian vector Multidimensional Scaling (MDS) model which was applied to data from a five-point Likert scale survey. This paper focused on Bayesian approach choice behavior multidimensional space model for the analysis of partially ranked data (rank top 3 from J data) to provide a joint space of individuals and products, using MCMC procedure. The procedure is similar to what Fong, DeSarbo, Park, and Scott (2010) did but this study used partial rank data instead of Likert scale data. The goal of this study was to create a probability-based model that calculates the average product utility which indicates how popular the product is. Lambdas or the item loadings are the direction of the products and thetas are the direction for the individuals. In addition, this study dealt with rotational invariance by calculating the optimal lambda values for each iteration and each dimension by flipping the sign so it approaches the average value. To determine the number of dimensions of the datasets, the sum of squared loadings were calculated. We applied the MCMC procedure to simulated data in which we sampled the loadings from the normal distribution as well as loadings from the real datasets. In addition, we applied the MCMC procedure to the real dataset and created a multidimensional space for the products.Psychometricskjt2007Measurement and EvaluationThesesNonlinear penalized estimation of true Q-matrix in cognitive diagnostic models
https://academiccommons.columbia.edu/catalog/ac:160812
Xiang, Rui10.7916/D8J96DKZThu, 08 Jun 2017 13:55:41 +0000A key issue of cognitive diagnostic models (CDMs) is the correct identification of Q-matrix which indicates the relationship between attributes and test items. Previous CDMs typically assumed a known Q-matrix provided by domain experts such as those who developed the questions. However, misspecifications of Q-matrix had been discovered in the past studies. The primary purpose of this research is to set up a mathematical framework to estimate the true Q-matrix based on item response data. The model considers all Q-matrix elements as parameters and estimates them through EM algorithm. Two simulation designs are conducted to evaluate the feasibility and performance of the model. An empirical study is addressed to compare the estimated Q-matrix with the one designed by experts. The results show that the model performs well and is able to identify 60% to 90% of correct elements of Q-matrix. The model also indicates possible misspecifications of the designed Q-matrix in the fraction subtraction test.Statistics, Education, Psychologyrx2107Measurement and EvaluationThesesExamining Uncertainty and Misspecification of Attributes in Cognitive Diagnostic Models
https://academiccommons.columbia.edu/catalog/ac:174822
Chen, Chen-Miao Carol10.7916/D8PC38K3Thu, 08 Jun 2017 13:54:26 +0000In recent years, cognitive diagnostic models (CDMs) have been widely used in educational assessment to provide a diagnostic profile (mastery/non-mastery) analysis for examinees, which gives insights into learning and teaching. However, there is often uncertainty about the specification of the Q-matrix that is required for CDMs, given that it is based on expert judgment. The current study uses a Bayesian approach to examine recovery of Q-matrix elements in the presence of uncertainty about some elements. The first simulation examined the situation where there is complete uncertainty about whether or not an attribute is required, when in fact it is required. The simulation results showed that recovery was generally excellent. However, recovery broke down when other elements of the Q-matrix were misspecified. Further simulations showed that, if one has some information about the attributes for a few items, then recovery improves considerably, but this also depends on how many other elements are misspecified. A second set of simulations examined the situation where uncertain Q-matrix elements were scattered throughout the Q-matrix. Recovery was generally excellent, even when some other elements were misspecified. A third set of simulations showed that using more informative priors did not uniformly improve recovery. An application of the approach to data from TIMSS (2007) suggested some alternative Q-matrices.Psychometricscc2410Measurement and EvaluationThesesExamining the Impact of Examinee-Selected Constructed Response Items in the Context of a Hierarchical Rater Signal Detection Model
https://academiccommons.columbia.edu/catalog/ac:186227
Patterson, Brian Francis10.7916/D8X929DCThu, 08 Jun 2017 13:54:26 +0000Research into the relatively rarely used examinee-selected item assessment designs has revealed certain challenges. This study aims to more comprehensively re-examine the key issues around examinee-selected items under a modern model for constructed-response scoring. Specifically, data were simulated under the hierarchical rater model with signal detection theory rater components (HRM-SDT; DeCarlo, Kim, and Johnson, 2011) and a variety of examinee-item selection mechanisms were considered. These conditions varied from the hypothetical baseline condition--where examinees choose randomly and with equal frequency from a pair of item prompts--to the perhaps more realistic and certainly more troublesome condition where examinees select items based on the very subject-area proficiency that the instrument intends to measure. While good examinee, item, and rater parameter recovery was apparent in the former condition for the HRM-SDT, serious issues with item and rater parameter estimation were apparent in the latter. Additional conditions were considered, as well as competing psychometric models for the estimation of examinee proficiency. Finally, practical implications of using examinee-selected item designs are given, as well as future directions for research.Educational tests and measurementsbfp2103Measurement and EvaluationThesesDealing with Sparse Rater Scoring of Constructed Responses within a Framework of a Latent Class Signal Detection Model
https://academiccommons.columbia.edu/catalog/ac:161491
Kim, Sunhee10.7916/D8T4419TThu, 08 Jun 2017 13:54:25 +0000In many assessment situations that use a constructed-response (CR) item, an examinee's response is evaluated by only one rater, which is called a single rater design. For example, in a classroom assessment practice, only one teacher grades each student's performance. While single rater designs are the most cost-effective method among all rater designs, the lack of a second rater causes difficulties with respect to how the scores should be used and evaluated. For example, one cannot assess rater reliability or rater effects when there is only one rater. The present study explores possible solutions for the issues that arise in sparse rater designs within the context of a latent class version of signal detection theory (LC-SDT) that has been previously used for rater scoring. This approach provides a model for rater cognition in CR scoring (DeCarlo, 2005; 2008; 2010) and offers measures of rater reliability and various rater effects. The following potential solutions to rater sparseness were examined: 1) the use of parameter restrictions to yield an identified model, 2) the use of informative priors in a Bayesian approach, and 3) the use of back readings (e.g., partially available 2nd rater observations), which are available in some large scale assessments. Simulations and analyses of real-world data are conducted to examine the performance of these approaches. Simulation results showed that using parameter constraints allows one to detect various rater effects that are of concern in practice. The Bayesian approach also gave useful results, although estimation of some of the parameters was poor and the standard deviations of the parameter posteriors were large, except when the sample size was large. Using back-reading scores gave an identified model and simulations showed that the results were generally acceptable, in terms of parameter estimation, except for small sample sizes. The paper also examines the utility of the approaches as applicable to the PIRLS USA reliability data. The results show some similarities and differences between parameter estimates obtained with posterior mode estimation and with Bayesian estimation. Sensitivity analyses revealed that rater parameter estimates are sensitive to the specification of the priors, as also found in the simulation results with smaller sample sizes.Educational tests and measurementsshk2125Measurement and EvaluationThesesSchematic Effects on Probability Problem Solving
https://academiccommons.columbia.edu/catalog/ac:174540
Gugga, Saranda Sonia10.7916/D89W0NPMThu, 08 Jun 2017 13:49:30 +0000Three studies examined context effects on solving probability problems. Variants of word problems were written with cover stories which differed with respect to social or temporal schemas, while maintaining formal problem structure and solution procedure. In the first of these studies it was shown that problems depicting schemas in which randomness was inappropriate or unexpected for the social situation were solved less often than problems depicting schemas in which randomness was appropriate. Another set of two studies examined temporal and causal schemas, in which the convention is that events are considered in forward direction. Pairs of conditional probability (CP) problems were written depicting events E1 and E2, such that E1 either occurs before E2 or causes E2. Problems were defined with respect to the order of events expressed in CPs, so that P(E2|E1) represents the CP in schema-consistent, intact order by considering the occurrence of E1 before E2, while P(E1|E2) represents CP in schema-inconsistent, inverted order. Introductory statistics students had greater difficulty encoding CP for events in schema-inconsistent order than CP of events in conventional deterministic order. The differential effects of schematic context on solving probability problems identify specific conditions and sources of bias in human reasoning under uncertainty. In addition, these biases may be influential when evaluating empirical findings in a manner similar to that demonstrated in this paper experimentally, and may have implications for how social scientists are trained in research methodology.Cognitive psychology, Psychometricsssg34Measurement and EvaluationThesesOn the Use of Covariates in a Latent Class Signal Detection Model, with Applications to Constructed Response Scoring
https://academiccommons.columbia.edu/catalog/ac:146692
Wang, Zijian Gerald10.7916/D8DB87ZPWed, 07 Jun 2017 17:02:08 +0000A latent class signal detection (SDT) model was recently introduced as an alternative to traditional item response theory (IRT) methods in the analysis of constructed response data. This class of models can be represented as restricted latent class models and differ from the IRT approach in the way the latent construct is conceptualized. One appeal of the signal detection approach is that it provides an intuitive framework from which psychological processes governing rater behavior can be better understood. The present study developed an extension of the latent class SDT model to include covariates and examined the performance of the resulting model. Covariates can be incorporated into the latent class SDT model in three ways: 1) to affect latent class membership, 2) conditional response probabilities and 3) both latent class membership and conditional response probabilities. In each case, simulations were conducted to investigate both parameter recovery and classification accuracy of the extended model under two competing rater designs; in addition, implications of ignoring covariate effects and covariate misspecification were explored. Here, the ability of information criteria, namely the AIC, small sample adjusted AIC and BIC, in recovering the true model with respect to how covariates are introduced was also examined. Results indicate that parameters were generally well recovered in fully-crossed designs; to obtain similar levels of estimation precision in incomplete designs, sample size requirements were comparatively higher and depend on the number of indicators used. When covariate effects were not accounted for or misspecified, results show that parameter estimates tend to be severely biased, which in turn reduced classification accuracy. With respect to model recovery, the BIC performed the most consistently amongst the information criteria considered. In light of these findings, recommendations were made with regard to sample size requirements and model building strategies when implementing the extended latent class SDT model.Educational tests and measurementszgw2Measurement and EvaluationThesesThe Relation between Uncertainty in Latent Class Membership and Outcomes in a Latent Class Signal Detection Model
https://academiccommons.columbia.edu/catalog/ac:146637
Cheng, Zhifen10.7916/D8ZP4D6SWed, 07 Jun 2017 16:57:00 +0000Latent class variables are often used to predict outcomes. The conventional practice is to first assign observations to one of the latent classes based on the maximum posterior probabilities. The assigned class membership is then treated as an observed variable and used in predicting the outcomes. This widely used classify-analyze strategy ignores the uncertainty of being in a certain latent class for the observations. Once an observation is classified to the latent class with the highest posterior probability, its probability of being in the assigned class is treated as being one. In addition, once observations are classified to the latent class with the highest posterior probability, their representativeness of the class becomes the same because they will all have a probability of one of being in the assigned class. Finally, standard errors are underestimated because the residual uncertainty about the latent class membership is ignored. This dissertation used simulation studies and an analysis of a real-world data set to compare five commonly adopted approaches (most likely class regression, probability regression, probability-weighted regression, pseudo-class regression, and the simultaneous approach) for measuring the association between a latent class variable and outcome variables to see which one can better account for the uncertainty in latent class membership in such a situation. The model considered in the study was a latent class extension of the signal detection model (LC-SDT) by DeCarlo, which has proved to be able to address certain measurement issues in the educational field, more specifically, rater issues involved in essay grading such as rater effects and rater reliability. An LC-SDT model has the potential for wide applications in education as well as other areas. Therefore it is important to explore the issue of accounting for uncertainty in latent class membership within this framework. Three ordinal outcome variables having a negative, weak, and strong association with the latent class variable were considered in the simulations. Results of the simulations showed that the simultaneous approach performed best in obtaining unbiased parameter estimates. It also yielded larger standard errors than the other approaches which have been found by previous research to underestimate standard errors. Even though the simultaneous approach has its advantages, including outcome variables in a latent class model can affect parameters of the response variables. Therefore, cautions need to be taken when using this approach. The analysis results of the real-world data set confirmed the trends observed in the simulation studies.Psychometrics, Statisticszc2133Measurement and EvaluationThesesRater Drift in Constructed Response Scoring via Latent Class Signal Detection Theory and Item Response Theory
https://academiccommons.columbia.edu/catalog/ac:132272
Park, Yoon Soo10.7916/D8445TGRWed, 07 Jun 2017 02:43:38 +0000The use of constructed response (CR) items or performance tasks to assess test takers' ability has grown tremendously over the past decade. Examples of CR items in psychological and educational measurement range from essays, works of art, and admissions interviews. However, unlike multiple-choice (MC) items that have predetermined options, CR items require test takers to construct their own answer. As such, they require the judgment of multiple raters that are subject to differences in perception and prior knowledge of the material being evaluated. As with any scoring procedure, the scores assigned by raters must be comparable over time and over different test administrations and forms; in other words, scores must be reliable and valid for all test takers, regardless of when an individual takes the test. This study examines how longitudinal patterns or changes in rater behavior affect model-based classification accuracy. Rater drift refers to changes in rater behavior across different test administrations. Prior research has found evidence of drift. Rater behavior in CR scoring is examined using two measurement models - latent class signal detection theory (SDT) and item response theory (IRT) models. Rater effects (e.g., leniency and strictness) are partly examined with simulations, where the ability of different models to capture changes in rater behavior is studied. Drift is also examined in two real-world large scale tests: teacher certification test and high school writing test. These tests use the same set of raters for long periods of time, where each rater's scoring is examined on a monthly basis. Results from the empirical analysis showed that rater models were effective to detect changes in rater behavior over testing administrations in real-world data. However, there were differences in rater discrimination between the latent class SDT and IRT models. Simulations were used to examine the effect of rater drift on classification accuracy and on differences between the latent class SDT and IRT models. Changes in rater severity had only a minimal effect on classification. Rater discrimination had a greater effect on classification accuracy. This study also found that IRT models detected changes in rater severity and in rater discrimination even when data were generated from the latent class SDT model. However, when data were non-normal, IRT models underestimated rater discrimination, which may lead to incorrect inferences on the precision of raters. These findings provide new and important insights into CR scoring and issues that emerge in practice, including methods to improve rater training.Psychometrics, Educational tests and measurements, Statisticsysp2102Measurement and EvaluationTheses