On the Origin of the Standardization Sensitivity in RegEM Climate Field Reconstructions
The regularized expectation maximization (RegEM) method has been used in recent studies to derive climate field reconstructions of Northern Hemisphere temperatures during the last millennium. Original pseudoproxy experiments that tested RegEM [with ridge regression regularization (RegEM-Ridge)] standardized the input data in a way that improved the performance of the reconstruction method, but included data from the reconstruction interval for estimates of the mean and standard deviation of the climate field—information that is not available in real-world reconstruction problems. When standardizations are confined to the calibration interval only, pseudoproxy reconstructions performed with RegEM-Ridge suffer from warm biases and variance losses. Only cursory explanations of this so-called standardization sensitivity of RegEM-Ridge have been published, but they have suggested that the selection of the regularization (ridge) parameter by means of minimizing the generalized cross validation (GCV) function is the source of the effect. The origin of the standardization sensitivity is more thoroughly investigated herein and is shown not to be associated with the selection of the ridge parameter; sets of derived reconstructions reveal that GCV-selected ridge parameters are minimally different for reconstructions standardized either over both the reconstruction and calibration interval or over the calibration interval only. While GCV may select ridge parameters that are different from those that precisely minimize the error in pseudoproxy reconstructions, RegEM reconstructions performed with truly optimized ridge parameters are not significantly different from those that use GCV-selected ridge parameters. The true source of the standardization sensitivity is attributable to the inclusion or exclusion of additional information provided by the reconstruction interval, namely, the mean and standard deviation fields computed for the complete modeled dataset. These fields are significantly different from those for the calibration period alone because of the violation of a standard EM assumption that missing values are missing at random in typical paleoreconstruction problems; climate data are predominantly missing in the preinstrumental period when the mean climate was significantly colder than the mean of the instrumental period. The origin of the standardization sensitivity therefore is not associated specifically with RegEM-Ridge, and more recent attempts to regularize the EM algorithm using truncated total least squares could theoretically also be susceptible to the problems affecting RegEM-Ridge. Nevertheless, the principal failure of RegEM-Ridge arises because of a poor initial estimate of the mean field, and therefore leaves open the possibility that alternative methods may perform better.
- 2008jcli2182_2E1.pdf application/pdf 1.54 MB Download File
Also Published In
- Journal of Climate