Theses Doctoral

Predicting and Understanding the Presence of Water through Remote Sensing, Machine Learning, and Uncertainty Quantification

Harrington, Matthew R.

In this dissertation I study the benefits that machine learning can bring to problems of Sustainable Development in the field of hydrology.

Specifically, in Chapter 1 I investigate how predictable groundwater depletion is across India and to what extent we can learn from the model’s predictions about underlying drivers. In Chapter 2, I joined a competition to predict the amount of water in snow in the western United States using satellite imagery and convolutional neural networks. Lastly, in Chapter 3 I examine how cloud cover impacts the machine learning model’s predictions and explore how cloudiness impacts the successes and limitation of the popular uncertainty quantification method known as Monte Carlo dropout. Food production in many parts of the world relies on groundwater resources.

In many regions, groundwater levels are declining due to a combination of anthropogenic abstraction, localized meteorological and geological characteristics, and climate change. Groundwater in India is characteristic of this global trend, with an agricultural sector that is highly dependent on groundwater and increasingly threatened by abstraction far in excess of recharge. The complexity of inputs makes groundwater depletion highly heterogeneous across space and time. However, modeling this heterogeneity has thus far proven difficult. In Chapter 1 using random forest models and high-resolution feature importance methods, we demonstrate a recent shift in the predictors of groundwater depletion in India and show an improved ability to make predictions at the district-level across seasons. We find that, as groundwater depletion begins to accelerate across India, deep-well irrigation use becomes 250% more important from 1996-2014, becoming the most important predictor of depletion in the majority of districts in northern and central India.

At the same time, even many of the districts that show gains in groundwater levels show an increasing importance of deep irrigation. Analysis shows widespread decreases in crop yields per unit of irrigation over our time period, suggesting decreasing marginal returns for the largely increasing quantities of groundwater irrigation used. Because anthropogenic and natural drivers of groundwater recharge are highly localized, understanding the relationship between multiple variables across space and time is inferentially challenging, yet extremely important. Our granular, district-focused models of groundwater depletion rates can inform decision-making across diverse hydrological conditions and water use needs across space, time, and groups of constituents.

In Chapter 2 I reflect on competing in the U.S. Bureau of Reclamation’s snow water equivalent prediction competition (Snowcast Showdown). This project was a joint effort with Isabella Smythe and we ended the competition scoring roughly 45th out of over 1000 teams on the public leaderboard. In this chapter I outline our approach and discuss the competition format, model building, and examine alternative approaches taken by other competitors. Similarly I consider the success and limitations of our own satellite-based approach and consider future improvements to iterate upon our model. In Chapter 3 I study the black-box deep learning model built on MODIS imagery to estimate snow water equivalent (SWE) made for the competition discussed in Chapter 2.

Specifically, I here investigate a major component of uncertainty in my remotely-sensed images: cloud cover which completely disrupts viewing of the surface in the visible spectrum. To understand the impact of cloud-driven missingness, I document how and where clouds occur in the dataset. I then use Monte Carlo dropout - a popular method of quantifying uncertainty in deep learning models - to learn how well the method captures the aleatoric errors unique to remote sensing with cloud cover. Next, I investigate how the underlying filters of the convolutional neural network appear using the guided backprop technique and draw conclusions regarding what features in the images the model was using to make its predictions. Lastly, I investigate what forms of validation best estimated the true generalization error in Chapter 2 using ordinary least squares (OLS) and the elastic-net technique.

These three chapters show that machine learning has an important place in the future of hydrology, however the tools that it brings are still difficult to interpret. Moreover, future work is still needed to bring these predictive advancements to scientific standards of understanding. This said, the increases to accuracy brought by the new techniques can currently make a difference to people’s lives who will face greater water scarcity as climate change accelerates.

Geographic Areas


  • thumnail for Harrington_columbia_0054D_17402.pdf Harrington_columbia_0054D_17402.pdf application/pdf 2.03 MB Download File

More About This Work

Academic Units
Sustainable Development
Thesis Advisors
Lall, Upmanu
Ph.D., Columbia University
Published Here
August 10, 2022