Academic Commons

Theses Doctoral

Social Network Extraction from Text

Agarwal, Apoorv

In the pre-digital age, when electronically stored information was non-existent, the only ways of creating representations of social networks were by hand through surveys, inter- views, and observations. In this digital age of the internet, numerous indications of social interactions and associations are available electronically in an easy to access manner as structured meta-data. This lessens our dependence on manual surveys and interviews for creating and studying social networks. However, there are sources of networks that remain untouched simply because they are not associated with any meta-data. Primary examples of such sources include the vast amounts of literary texts, news articles, content of emails, and other forms of unstructured and semi-structured texts.
The main contribution of this thesis is the introduction of natural language processing and applied machine learning techniques for uncovering social networks in such sources of unstructured and semi-structured texts. Specifically, we propose three novel techniques for mining social networks from three types of texts: unstructured texts (such as literary texts), emails, and movie screenplays. For each of these types of texts, we demonstrate the utility of the extracted networks on three applications (one for each type of text).

Files

  • thumnail for Agarwal_columbia_0054D_13614.pdf Agarwal_columbia_0054D_13614.pdf binary/octet-stream 2.96 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Rambow, Owen C.
Degree
Ph.D., Columbia University
Published Here
October 14, 2016
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.