2020 Theses Doctoral
Community Detection in Social Networks: Multilayer Networks and Pairwise Covariates
Community detection is one of the most fundamental problems in network study. The stochastic block model (SBM) is arguably the most studied model for network data with different estimation methods developed with their community detection consistency results unveiled. Due to its stringent assumptions, SBM may not be suitable for many real-world problems. In this thesis, we present two approaches that incorporate extra information compared with vanilla SBM to help improve community detection performance and be suitable for applications.
One approach is to stack multilayer networks that are composed of multiple single-layer networks with common community structure. Numerous methods have been proposed based on spectral clustering, but most rely on optimizing an objective function while the associated theoretical properties remain to be largely unexplored. We focus on the `early fusion' method, of which the target is to minimize the spectral clustering error of the weighted adjacency matrix (WAM). We derive the optimal weights by studying the asymptotic behavior of eigenvalues and eigenvectors of the WAM. We show that the eigenvector of WAM converges to a normal distribution, and the clustering error is monotonically decreasing with the eigenvalue gap. This fact reveals the intrinsic link between eigenvalues and eigenvectors, and thus the algorithm will minimize the clustering error by maximizing the eigenvalue gap. The numerical study shows that our algorithm outperforms other state-of-art methods significantly, especially when signal-to-noise ratios of layers vary widely. Our algorithm also yields higher accuracy result for S&P 1500 stocks dataset than competing models.
The other approach we propose is to consider heterogeneous connection probabilities to remove the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariates information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to fit PCABM efficiently. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model under a wide range of simulated and real networks when covariate information is accessible.
- Huang_columbia_0054D_15925.pdf application/pdf 1.48 MB Download File
More About This Work
- Academic Units
- Thesis Advisors
- Ying, Zhiliang
- Ph.D., Columbia University
- Published Here
- June 8, 2020