Academic Commons

Theses Doctoral

Deception in Spoken Dialogue: Classification and Individual Differences

Levitan, Sarah Ita

Automatic deception detection is an important problem with far-reaching implications in many areas, including law enforcement, military and intelligence agencies, social services, and politics. Despite extensive efforts to develop automated deception detection technologies, there have been few objective successes. This is likely due to the many challenges involved, including the lack of large, cleanly recorded corpora; the difficulty of acquiring ground truth labels; and major differences in incentives for lying in the laboratory vs. lying in real life. Another well-recognized issue is that there are individual and cultural differences in deception production and detection, although little has been done to identify them. Human performance at deception detection is at the level of chance, making it an uncommon problem where machines can potentially outperform humans.
This thesis addresses these challenges associated with research of deceptive speech. We created the Columbia X-Cultural Deception (CXD) Corpus, a large-scale collection of deceptive and non-deceptive dialogues between native speakers of Standard American English and Mandarin Chinese. This corpus enabled a comprehensive study of deceptive speech on a large scale.
In the first part of the thesis, we introduce the CXD corpus and present an empirical analysis of acoustic-prosodic and linguistic cues to deception. We also describe machine learning classification experiments to automatically identify deceptive speech using those features. Our best classifier achieves classification accuracy of almost 70%, well above human performance.
The second part of this thesis addresses individual differences in deceptive speech. We present a comprehensive analysis of individual differences in verbal cues to deception, and several methods for leveraging these speaker differences to improve automatic deception classification. We identify many differences in cues to deception across gender, native language, and personality. Our comparison of approaches for leveraging these differences shows that speaker-dependent features that capture a speaker's deviation from their natural speaking style can improve deception classification performance. We also develop neural network models that accurately model speaker-specific patterns of deceptive speech.
The contributions of this work add substantially to our scientific understanding of deceptive speech, and have practical implications for human practitioners and automatic deception detection.


  • thumnail for Levitan_columbia_0054D_15072.pdf Levitan_columbia_0054D_15072.pdf application/pdf 2.92 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Hirschberg, Julia Bell
Ph.D., Columbia University
Published Here
February 1, 2019