Theses Doctoral

Statistical Inference for Diagnostic Classification Models

Xu, Gongjun

Diagnostic classification models (DCM) are an important recent development in educational and psychological testing. Instead of an overall test score, a diagnostic test provides each subject with a profile detailing the concepts and skills (often called "attributes") that he/she has mastered. Central to many DCMs is the so-called Q-matrix, an incidence matrix specifying the item-attribute relationship. It is common practice for the Q-matrix to be specified by experts when items are written, rather than through data-driven calibration. Such a non-empirical approach may lead to misspecification of the Q-matrix and substantial lack of model fit, resulting in erroneous interpretation of testing results. This motivates our study and we consider the identifiability, estimation, and hypothesis testing of the Q-matrix. In addition, we study the identifiability of diagnostic model parameters under a known Q-matrix. The first part of this thesis is concerned with estimation of the Q-matrix. In particular, we present definitive answers to the learnability of the Q-matrix for one of the most commonly used models, the DINA model, by specifying a set of sufficient conditions under which the Q-matrix is identifiable up to an explicitly defined equivalence class. We also present the corresponding data-driven construction of the Q-matrix. The results and analysis strategies are general in the sense that they can be further extended to other diagnostic models. The second part of the thesis focuses on statistical validation of the Q-matrix. The purpose of this study is to provide a statistical procedure to help decide whether to accept the Q-matrix provided by the experts. Statistically, this problem can be formulated as a pure significance testing problem with null hypothesis H0 : Q = Q0, where Q0 is the candidate Q-matrix. We propose a test statistic that measures the consistency of observed data with the proposed Q-matrix. Theoretical properties of the test statistic are studied. In addition, we conduct simulation studies to show the performance of the proposed procedure. The third part of this thesis is concerned with the identifiability of the diagnostic model parameters when the Q-matrix is correctly specified. Identifiability is a prerequisite for statistical inference, such as parameter estimation and hypothesis testing. We present sufficient and necessary conditions under which the model parameters are identifiable from the response data.


  • thumnail for Xu_columbia_0054D_11271.pdf Xu_columbia_0054D_11271.pdf application/pdf 573 KB Download File

More About This Work

Academic Units
Thesis Advisors
Ying, Zhiliang
Liu, Jingchen
Ph.D., Columbia University
Published Here
April 30, 2013