Theses Doctoral

Essays in History and Spatial Economics with Big Data

Lee, Sun Kyoung

This dissertation contains three essays in History and Spatial Economics with Big Data. As a part of my dissertation, I develop a modern machine-learning based approach to connect large datasets. Merging several massive databases and matching the records within them presents challenges — some straightforward and others more complex. I employ artificial intelligence and machine learning technologies to link and then analyze massive amounts of historical US federal census, Department of Labor, and Bureau of Labor Statistics data.

The transformation of the US economy during this time period was remarkable, from a rural economy at the beginning of the 19th century to an industrial nation by the end. More strikingly, after lagging behind the technological frontier for most of the nineteenth century, the United States entered the twenty-first century as the global technology leader and the richest nation in the world. Results from this dissertation reveal how people lived and how the business operated. It tells us the past that led us to where we are now in terms of people, geography, prices and wages, wealth, revenue, output, capital, numbers, and types of workers, urbanization, migration, and industrialization.

As a part of this endeavor, the first chapter studies how the benefits of improving urban mass transit infrastructures in cities are shared across workers with different skills. It exploits a unique historical setting to estimate the impact of urban transportation infrastructure: the introduction of mass-public transit infrastructure in the late nineteenth and twentieth-century New York City. I linked individual-level US census data to investigate how urban transit infrastructure differentially affects the welfare of workers with heterogenous skill. My second chapter measures immigrants' role in the US rise as an economic power. Especially, this chapter focuses on a potential mechanism by which immigrants might have spurred economic prosperity: the transfer of new knowledge. This is the first project to use advances in quantitative spatial theory along with advanced big-data techniques to understand the contribution of immigrants to the process of U.S. economic growth. The key benefit of this approach is to link modern theory with massive amounts of microeconomic data about individual immigrants—their locations and occupations—to address questions that are extremely difficult to assess otherwise. Specifically, the dataset will help the researchers understand the extent to which the novel ideas and expertise immigrants brought to U.S. shores drove the nation’s emergence as an industrial and technological powerhouse.

My third chapter exploits advances in data digitization and machine learning to study intergenerational mobility in the United States before World War II. Using machine learning techniques, I construct a massive database for multiple generations of fathers and sons. This allows us to identify “land of opportunities": locations and times in American history where kids had chances to move up in the income ladder. I find that intergenerational mobility elasticities were relatively stable during 1880-1940; there are regional disparities in terms of giving kids opportunities to move up, and the geographic disparities of intergenerational mobility have evolved over time.

Geographic Areas


  • thumnail for LEE_columbia_0054D_15500.pdf LEE_columbia_0054D_15500.pdf application/pdf 11.5 MB Download File

More About This Work

Academic Units
Thesis Advisors
Davis, Donald R.
Ph.D., Columbia University
Published Here
October 21, 2019