Theses Doctoral

Essays on the use of computational linguistics in marketing

Lemaire, Alain Philippe

This thesis explores the use of unstructured data, and specifically textual data, in providing consumer insights and improving business decisions. The thesis consists of two essays. In essay I, I examine how the linguistic similarity between the language used by reviewers of a product and a prospective customer’s own writing style can be leveraged to assess the match between customers and products. Applying tools from machine learning, Bayesian statistics, and computational linguistics to a large-scale dataset from Yelp, I find that the closer the writing style of a restaurant’s past reviews are to a prospective customer’s writing style, the more likely that customer is to write a review for that restaurant. This effect holds across restaurant types and is driven by the linguistic similarity between the customer’s own reviews and positive past reviews for the restaurant. Further, I find that similarity with respect to words related to leisure (e.g., family, wine, beer, weekend), biology (e.g., eat, life, love), as well as swear words are most influential in creating a match between customers and restaurants.

In essay II, I examine whether borrowers consciously or not, leave traces of their intentions, circumstances, and personality traits in the text they write when applying for a loan. I find that this textual information has a substantial and significant ability to predict whether borrowers will pay back the loan above and beyond the financial and demographic variables commonly used in models predicting default. Using text-mining and machine-learning tools to automatically process and analyze the raw text in over 120 thousand loan requests from, an online crowdfunding platform, I find that including the textual information in the loan significantly helps predict loan default and can have substantial financial implications. I find that loan requests written by defaulting borrowers are more likely to include words related to their family, mentions of God, the borrower’s financial and general hardship, pleading lenders for help, and short-term focused words. I further observe that defaulting loan requests are written in a manner consistent with the writing style of extroverts and liars.


  • thumnail for Lemaire_columbia_0054D_16146.pdf Lemaire_columbia_0054D_16146.pdf application/pdf 2.48 MB Download File

More About This Work

Academic Units
Thesis Advisors
Netzer, Oded
Ph.D., Columbia University
Published Here
September 8, 2020