2020 Theses Doctoral
Content Selection for Effective Counter-Argument Generation
The information ecosystem of social media has resulted in an abundance of opinions on political topics and current events. In order to encourage better discussions, it is important to promote high-quality responses and relegate low-quality ones.
We thus focus on automatically analyzing and generating counter-arguments in response to posts on social media with the goal of providing effective responses.
This thesis is composed of three parts. In the first part, we conduct an analysis of arguments. Specifically, we first annotate discussions from Reddit for aspects of arguments and then analyze them for their persuasive impact. Then we present approaches to identify the argumentative structure of these discussions and predict the persuasiveness of an argument. We evaluate each component independently using automatic or manual evaluations and show significant improvement in each.
In the second part, we leverage our discoveries from our analysis in the process of generating counter-arguments. We develop two approaches in the retrieve-and-edit framework, where we obtain content using methods created during our analysis of arguments, among others, and then modify the content using techniques from natural language generation. In the first approach, we develop an approach to retrieve counter-arguments by annotating a dataset for stance and building models for stance prediction. Then we use our approaches from our analysis of arguments to extract persuasive argumentative content before modifying non-content phrases for coherence. In contrast, in the second approach we create a dataset and models for modifying content -- making semantic edits to a claim to have a contrasting stance. We evaluate our approaches using intrinsic automatic evaluation of our predictive models and an overall human evaluation of our generated output.
Finally, in the third part, we discuss the semantic challenges of argumentation that we need to solve in order to make progress in the understanding of arguments. To clarify, we develop new methods for identifying two types of semantic relations -- causality and veracity. For causality, we build a distant-labeled dataset of causal relations using lexical indicators and then we leverage features from those indicators to build predictive models. For veracity, we build new models to retrieve evidence given a claim and predict whether the claim is supported by that evidence. We also develop a new dataset for veracity to illuminate the areas that need progress. We evaluate these approaches using automated and manual techniques and obtain significant improvement over strong baselines.
Finally, we apply these techniques to claims in the domain of household electricity consumption, mining claims using our methods for causal relations and then verifying their truthfulness.
- Hidey_columbia_0054D_15916.pdf application/pdf 4.06 MB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- McKeown, Kathleen
- Ph.D., Columbia University
- Published Here
- June 22, 2020