2013 Theses Doctoral

# Statistical Inference for Diagnostic Classification Models

Diagnostic classification models (DCM) are an important recent development in educational and psychological testing. Instead of an overall test score, a diagnostic test provides each subject with a profile detailing the concepts and skills (often called "attributes") that he/she has mastered. Central to many DCMs is the so-called Q-matrix, an incidence matrix specifying the item-attribute relationship. It is common practice for the Q-matrix to be specified by experts when items are written, rather than through data-driven calibration. Such a non-empirical approach may lead to misspecification of the Q-matrix and substantial lack of model fit, resulting in erroneous interpretation of testing results. This motivates our study and we consider the identifiability, estimation, and hypothesis testing of the Q-matrix. In addition, we study the identifiability of diagnostic model parameters under a known Q-matrix. The first part of this thesis is concerned with estimation of the Q-matrix. In particular, we present definitive answers to the learnability of the Q-matrix for one of the most commonly used models, the DINA model, by specifying a set of sufficient conditions under which the Q-matrix is identifiable up to an explicitly defined equivalence class. We also present the corresponding data-driven construction of the Q-matrix. The results and analysis strategies are general in the sense that they can be further extended to other diagnostic models. The second part of the thesis focuses on statistical validation of the Q-matrix. The purpose of this study is to provide a statistical procedure to help decide whether to accept the Q-matrix provided by the experts. Statistically, this problem can be formulated as a pure significance testing problem with null hypothesis H0 : Q = Q0, where Q0 is the candidate Q-matrix. We propose a test statistic that measures the consistency of observed data with the proposed Q-matrix. Theoretical properties of the test statistic are studied. In addition, we conduct simulation studies to show the performance of the proposed procedure. The third part of this thesis is concerned with the identifiability of the diagnostic model parameters when the Q-matrix is correctly specified. Identifiability is a prerequisite for statistical inference, such as parameter estimation and hypothesis testing. We present sufficient and necessary conditions under which the model parameters are identifiable from the response data.

## Files

- Xu_columbia_0054D_11271.pdf application/pdf 573 KB Download File

## More About This Work

- Academic Units
- Statistics
- Thesis Advisors
- Ying, Zhiliang
- Liu, Jingchen
- Degree
- Ph.D., Columbia University
- Published Here
- April 30, 2013