2025 Theses Doctoral
Towards Trustworthy AI: Detecting, Understanding, and Mitigating Information Disorder
The spread of misinformation, propaganda, and manipulative narratives -- collectively referred to as information disorder -- has emerged as a major threat to the trustworthiness of online discourse. This challenge is further intensified by the rise of large language models (LLMs), which, while offering powerful generative capabilities, also create new risks for abuse and misalignment. In this dissertation, I investigate how information disorder manifests itself and evolves across both human and AI-mediated communication channels, and explore key challenges and solutions at different stages of the landscape -- from detection and intent analysis to mitigation strategies -- through the lens of trustworthy AI.
This dissertation is structured around three core stages, each addressing a distinct facet of combating information disorder. First, I develop content-based detection frameworks that integrate textual, visual, and propagation-based features to identify untrustworthy and manipulative posts on social media. Second, I examine the anatomy of information disorder by developing models that (1) capture the intent and manipulation strategies of malicious actors -- such as propaganda techniques and emotional appeals, (2) analyze audience perception, including how user traits shape susceptibility to radicalizing content, and (3) explore the emergent threat of LLMs acting as malicious agents capable of generating deceptive or coercive communication. Third, I present system-level mitigation strategies that address these threats at scale: defending against LLM-driven social engineering, improving factual control in generative outputs, curating high-quality and novel training and retrieval content through document-level distinctiveness scoring, and designing steerable, human-in-the-loop research agents that support principled alignment and intervention.
Across these stages, this dissertation advances modular, interpretable, and context-aware systems that address the evolving threat landscape of information disorder. The findings lay the foundation for future research on LLM-based trust infrastructure, agent-level safety, and human-in-the-loop alignment in complex information ecosystems.
Subjects
Files
-
Ai_columbia_0054D_19536.pdf
application/pdf
4.5 MB
Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Hirschberg, Julia Bell
- Degree
- Ph.D., Columbia University
- Published Here
- October 22, 2025