Theses Doctoral

Ranking Algorithms on Directed Configuration Networks

Chen, Ningyuan

In recent decades, complex real-world networks, such as social networks, the World Wide Web, financial networks, etc., have become a popular subject for both researchers and practitioners. This is largely due to the advances in computing power and big-data analytics. A key issue of analyzing these networks is the centrality of nodes. Ranking algorithms are designed to achieve the goal, e.g., Google's PageRank. We analyze the asymptotic distribution of the rank of a randomly chosen node, computed by a family of ranking algorithms on a random graph, including PageRank, when the size of the network grows to infinity.
We propose a configuration model generating the structure of a directed graph given in- and out-degree distributions of the nodes. The algorithm guarantees the generated graph to be simple (without self-loops and multiple edges in the same direction) for a broad spectrum of degree distributions, including power-law distributions. Power-law degree distribution is referred to as scale-free property and observed in many real-world networks. On the random graph G_n=(V_n,E_n) generated by the configuration model, we study the distribution of the ranks, which solves
R_i=∑ _{j: (j,i) ∈ E_n} (C_jR_j +Q_i)
for all node i, some weight C_i and personalization value Q_i.
We show that as the size of the graph n → ∞, the rank of a randomly chosen node converges weakly to the endogenous solution of the
R =^D ∑ _{i=1}^N (C_iR_i + Q),
where (Q, N, {C_i}) is a random vector and {R_i} are i.i.d. copies of R, independent of (Q, N,{C_i}). This main result is divided into three steps. First, we show that the rank of a randomly chosen node can be approximated by applying the ranking algorithm on the graph for finite iterations. Second, by coupling the graph to a branching tree that is governed by the empirical size-biased distribution, we approximate the finite iteration of the ranking algorithm by the root node of the branching tree. Finally, we prove that the rank of the root of the branching tree converges to that of a limiting weighted branching process, which is independent of n and solves the stochastic fixed-point equation. Our result formalizes the well-known heuristics, that a network often locally possesses a tree-like structure. We conduct a numerical example showing that the approximation is very accurate for English Wikipedia pages (over 5 million).
To draw a sample from the endogenous solution of the stochastic fixed-point equation, one can run linear branching recursions on a weighted branching process. We provide an iterative simulation algorithm based on bootstrap. Compared to the naive Monte Carlo, our algorithm reduces the complexity from exponential to linear in the number of recursions. We show that as the bootstrap sample size tends to infinity, the sample drawn according to our algorithm converges to the target distribution in the Kantorovich-Rubinstein distance and the estimator is consistent.

Files

  • thumnail for Chen_columbia_0054D_12963.pdf Chen_columbia_0054D_12963.pdf binary/octet-stream 1.02 MB Download File

More About This Work

Academic Units
Industrial Engineering and Operations Research
Thesis Advisors
Olvera-Cravioto, Mariana
Degree
Ph.D., Columbia University
Published Here
September 28, 2015