Accuracy of Empirical Satellite Algorithms for Mapping Phytoplankton Diagnostic Pigments in the Open Ocean: A Supervised Learning Perspective

Stock, Andy; Subramaniam, Ajit

Monitoring phytoplankton community composition from space is an important challenge in ocean remote sensing. Researchers have proposed several algorithms for this purpose. However, the in situ data used to train and validate such algorithms at the global scale are often clustered along ship cruise tracks and in some well-studied locations, whereas many large marine regions have no in situ data at all. Furthermore, oceanographic variables are typically spatially auto-correlated. In this situation, the common practice of validating algorithms with randomly chosen held-out observations can underestimate errors. Based on a global database of in situ HPLC data, we applied supervised learning methods to train and test empirical algorithms predicting the relative concentrations of eight diagnostic pigments that serve as biomarkers for different phytoplankton types. For each pigment, we trained three types of satellite algorithms distinguished by their input data: abundance-based (using only chlorophyll a as input), spectral (using remote sensing reflectance), and ecological algorithms (combining reflectance and environmental variables). The algorithms were implemented as statistical models (smoothing splines, polynomials, random forests, and boosted regression trees). To address clustering of data and spatial auto-correlation, we tested the algorithms by means of spatial block cross-validation. This provided a less confident picture of the potential for global mapping of diagnostic pigments and hence the associated phytoplankton types using existing satellite data than suggested by some previous research and a fivefold cross-validation conducted for comparison. Of the eight diagnostic pigments, only two (fucoxanthin and zeaxanthin) could be predicted in marine regions that the algorithms were not trained in with considerably lower errors than a constant null model. Thus, global-scale algorithms based on existing, multispectral satellite data and commonly available environmental variables can estimate relative diagnostic pigment concentrations and hence distinguish phytoplankton types in some broad classes, but are likely inaccurate for some classes and in some marine regions. Overall, the ecological algorithms had the lowest prediction errors, suggesting that environmental variables contain information about the global spatial distribution of phytoplankton groups that is not captured in multi-spectral remote sensing reflectance and satellite-derived Chl a concentrations. Weighting training observations inversely to the degree of spatial clustering improved predictions. Finally, our results suggest that more discussion of the best approaches for training and validating empirical satellite algorithms is needed if the in situ data are unevenly distributed in the study region and spatially clustered.


  • thumnail for Stock&Subramaniam2020fmars.pdf Stock&Subramaniam2020fmars.pdf application/pdf 854 KB Download File

Also Published In

Frontiers in Marine Science

More About This Work

Academic Units
Lamont-Doherty Earth Observatory
Biology and Paleo Environment
Published Here
January 10, 2022