Theses Doctoral

Statistical Methods for Structured Data: Analyses of Discrete Time Series and Networks

Palmer, William Reed

This dissertation addresses three problems of applied statistics involving discrete time series and network data. The three problems are (1) finding and analyzing community structure in directed networks, (2) capturing changes in dynamic count-valued time series of COVID-19 daily deaths, and (3) inferring the edges of an implicit network given noisy observations of a multivariate point process on its nodes. We use tools of spectral clustering, state-space models, Bayesian hierarchical modeling and variational inference to address these problems. Each chapter presents and discusses statistical methods for the given problem. We apply the methods to simulated and real data to both validate them and demonstrate their limitations.

In chapter 1 we consider a directed spectral method for community detection that utilizes a graph Laplacian defined for non-symmetric adjacency matrices. We give the theoretical motivation behind this directed graph Laplacian, and demonstrate its connection to an objective function that reflects a notion of how communities of nodes in directed networks should behave. Applying the method to directed networks, we compare the results to an approach using a symmetrized version of the adjacency matrices. A simulation study with a directed stochastic block model shows that directed spectral clustering can succeed where the symmetrized approach fails. And we find interesting and informative differences between the two approaches in the application to Congressional cosponsorship data.

n chapter 2 we propose a generalized non-linear state-space model for count-valued time series of COVID-19 fatalities. To capture the dynamic changes in daily COVID-19 death counts, we specify a latent state process that involves second order differencing and an AR(1)-ARCH(1) model. These modeling choices are motivated by the application and validated by model assessment. We consider and fit a progression of Bayesian hierarchical models under this general framework. Using COVID-19 daily death counts from New York City's five boroughs, we evaluate and compare the considered models through predictive model assessment. Our findings justify the elements included in the proposed model. The proposed model is further applied to time series of COVID-19 deaths from the four most populous counties in Texas. These model fits illuminate dynamics associated with multiple dynamic phases and show the applicability of the framework to localities beyond New York City.

In Chapter 3 we consider the task of inferring the connections between noisy observations of events. In our model-based approach, we consider a generative process incorporating latent dynamics that are directed by past events and the unobserved network structure. This process is based on a leaky integrate-and-fire (LIF) model from neuroscience for aggregating input and triggering events (spikes) in neural populations. Given observation data we estimate the model parameters with a novel variational Bayesian approach, specifying a highly structured and parsimonious approximation for the conditional posterior distribution of the process's latent dynamics. This approach allows for fully interpretable inference of both the model parameters of interest and the variational parameters. Moreover, it is computationally efficient in scenarios when the observed event times are not too sparse.

We apply our methods in a simulation study and to recorded neural activity in the dorsomedial frontal cortex (DMFC) of a rhesus macaque. We assess our results based on ground truth, model diagnostics, and spike prediction for held-out nodes.


  • thumnail for Palmer_columbia_0054D_17929.pdf Palmer_columbia_0054D_17929.pdf application/pdf 3.99 MB Download File

More About This Work

Academic Units
Thesis Advisors
Zheng, Tian
Ph.D., Columbia University
Published Here
July 5, 2023