Theses Doctoral

Text Classification: Exploiting the Social Network

Alkhereyf, Sakhar Badr M

Within the context of social networks, existing methods for document classification tasks typically only capture textual semantics while ignoring the text’s metadata, e.g., the users who exchange emails and the communication networks they form. However, some work has shown that incorporating the social network information in addition to information from language is useful for various NLP applications, including sentiment analysis, inferring user attributes, and predicting interpersonal relations.

In this thesis, we present empirical studies of incorporating social network information from the underlying communication graphs for various text classification tasks. We show different graph representations for different problems. Also, we introduce social network features extracted from these graphs. We use and extend graph embedding models for text classification.

Our contributions are as follows. First, we have annotated large datasets of emails with fine-grained business and personal labels. Second, we propose graph representations for the social networks induced from documents and users and apply them on different text classification tasks. Third, we propose social network features extracted from these structures for documents and users. Fourth, we exploit different methods for modeling the social network of communication for four tasks: email classification into business and personal, overt display of power detection in emails, hierarchical power detection in emails, and Reddit post classification.

Our main findings are: incorporating the social network information using our proposed methods improves the classification performance for all of the four tasks, and we beat the state-of-the-art graph embedding based model on the three tasks on email; additionally, for the fourth task (Reddit post classification), we argue that simple methods with the proper representation for the task can outperform a state-of-the-art generic model.


  • thumnail for Alkhereyf_columbia_0054D_16300.pdf Alkhereyf_columbia_0054D_16300.pdf application/pdf 1.63 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Rambow, Owen C.
Ph.D., Columbia University
Published Here
December 14, 2020