Theses Doctoral

A Graphon-based Framework for Modeling Large Networks

He, Ran

This thesis focuses on a new graphon-based approach for fitting models to large networks and establishes a general framework for incorporating nodal attributes to modeling. The scale of network data nowadays, renders classical network modeling and inference inappropriate. Novel modeling strategies are required as well as estimation methods.
Depending on whether the model structure is specified a priori or solely determined from data, existing models for networks can be classified as parametric and non-parametric. Compared to the former, a non-parametric model often allows for an easier and more straightforward estimation procedure of the network structure. On the other hand, the connectivities and dynamics of networks fitted by non-parametric models can be quite difficult to interpret, as compared to parametric models.
In this thesis, we first propose a computational estimation procedure for a class of parametric models that are among the most widely used models for networks, built upon tools from non-parametric models with practical innovations that make it efficient and capable of scaling to large networks.
Extensions of this base method are then considered in two directions. Inspired by a popular network sampling method, we further propose an estimation algorithm using sampled data, in order to circumvent the practical obstacle that the entire network data is hard to obtain and analyze. The base algorithm is also generalized to consider the case of complex network structure where nodal attributes are involved. Two general frameworks of a non-parametric model are proposed in order to incorporate nodal impact, one with a hierarchical structure, and the other employs similarity measures.
Several simulation studies are carried out to illustrate the improved performance of our proposed methods over existing algorithms. The proposed methods are also applied to several real data sets, including Slashdot online social networks and in-school friendship networks from the National Longitudinal Study of Adolescent to Adult Health (AddHealth Study). An array of graphical visualizations and quantitative diagnostic tools, which are specifically designed for the evaluation of goodness of fit for network models, are developed and illustrated with these data sets. Some observations of using these tools via our algorithms are also examined and discussed.


  • thumnail for He_columbia_0054D_12735.pdf He_columbia_0054D_12735.pdf binary/octet-stream 1.15 MB Download File

More About This Work

Academic Units
Thesis Advisors
Zheng, Tian
Ph.D., Columbia University
Published Here
May 11, 2015