Theses Doctoral

# Essays on Network Theory: Diffusion, Link Analysis, and Hypergraph Learning

Wang, Shatian

This thesis contributes to the methodology and application of network theory, the study of graphs as a representation of real systems. In particular, we present four essays on problems related to social network analysis, link analysis, and biological network analysis.

Chapters 1 and 2 present two pieces of work on social network analysis, where we model and optimize product diffusion through Word-of-Mouth on social networks. Specifically, we use a directed graph and a limiting case of the Erdős–Rényi random graph to respectively model exact and approximate social network structures. We then build mathematical models to describe how information diffuses among connected individuals in these networks. Using our network-based diffusion models, we design algorithms to optimally control product diffusion and maximize revenue from influencer marketing and referral marketing.

Chapter 3 explores link analysis of crowd-sourced data on user-item ratings. We represent these ratings with a bipartite network containing user vertices and item vertices. Such a network representation encodes crucial relationship information among users and items that are not apparent from isolated ratings. We propose network-based algorithms to extract useful information from the structure of the bipartite network to predict award outcomes. In using movie ratings data to predict Academy Award nominees and winners, our proposed algorithms significantly outperform other rating-based baselines and state-of-the-art algorithms. Our algorithms can also predict award outcomes and future item popularity in other domains such as books, music, and dramas where user-item ratings are available, without task-specific feature engineering.

Chapter 4 is inspired by an application of biological network analysis: learning effective drug combinations, which can be cast as the problem of learning a hidden hypergraph with n vertices and m hyperedges, where a vertex corresponds to a drug and a hyperedge represents a minimal set of drugs that are an effective treatment. We can learn the hidden hyperedges using membership queries: each query corresponds to a test evaluating whether a subset of the drugs is effective. If the query result is positive, then it means that the tested subset contains at least one hyperedge. We propose the first algorithms with poly(n, m) query complexity for learning non-trivial families of hypergraphs that have a super-constant number of edges of super-constant size.