Understanding forecast verification statistics

Mason, Simon J.

Although there are numerous reasons for performing a verification analysis, there are usually two general questions that are of interest: are the forecasts good, and can we be confident that the estimate of forecast quality is not misleading? When calculating a verification score, it is not usually obvious how the score can answer either of these questions. Some procedures for attempting to answer the questions are reviewed, with particular focus on p-values and confidence intervals. P-values are shown to be rather unhelpful in answering either question, especially when applied to probabilistic verification scores, and confidence intervals are to be preferred. However, confidence intervals cannot reveal biases in the value of a score that arises from an inadequate experimental design for testing on truly out-of-sample observations. Some specific problems with cross validation are highlighted. Finally, in the interests of increasing the insight into forecast strengths and weaknesses and in pointing towards methods for improving forecast quality, a plea is made for a more discriminating selection of verification procedures than has been adopted to date.


  • thumnail for Mason_SJ_2008_MetApps_15_31.pdf Mason_SJ_2008_MetApps_15_31.pdf application/pdf 183 KB Download File

Also Published In

Meteorological Applications

More About This Work

Academic Units
International Research Institute for Climate and Society
Published Here
March 13, 2020