Academic Commons

Theses Doctoral

Statistical Learning for Process Data

Wang, Zhi

Computer-based tests facilitate the collection of problem-solving processes, also known as process data. Response processes recorded in computer log files provide a new venue for investigating and understanding human behaviors. This thesis focuses on the development of statistical learning methods for process data and considers the following three problems.

The first problem is feature extraction. Response processes are noisy and of non-standard formats. To exploit information in process data, we propose two generic methods that summarize response processes to vectors so that standard statistical tools such as regression models are applicable. In Chapter 2, features are extracted using multidimensional scaling and a pairwise dissimilarity measure of response processes. Chapter 3 utilizes autoencoder and recurrent neural network to explore the latent structure of process data. For both methods, empirical studies show that the extracted features preserve a substantial amount of information in the observed processes and have greater predictive power for many variables than the traditional item responses.

The second problem is assessment based on process data. We present a statistical procedure in Chapter 4 that incorporates process information to improve the latent trait estimation of item response theory models. The procedure is data-driven and can be easily implemented by means of regression models. Theoretical guarantee is established for the mean squared error reduction. Application of this new process-data-based estimator to a real dataset shows that it achieves higher reliability than the traditional item-response-theory-based estimator.

The third problem is identification of problem-solving strategies for exploratory analysis. The approach presented in Chapter 5 segments individual process into a sequence of more homogeneous subprocesses using action predictability. Each subprocess is associated with a subtask whereby long and complex response process can be transformed into shorter and more interpretable subtask sequence. Using this approach, problem-solving strategies can be visualized and compared among groups of respondents and process information can be decomposed for further analysis.


  • thumnail for Wang_columbia_0054D_16661.pdf Wang_columbia_0054D_16661.pdf application/pdf 2.26 MB Download File

More About This Work

Academic Units
Thesis Advisors
Liu, Jingchen
Ying, Zhiliang
Ph.D., Columbia University
Published Here
June 29, 2021