Each year, verification scores for the operational implementations of the Hirlam system are collected and presented in a composite diagram. The rms errors in sea-level pressure for the first three months of this year, for five operational centres, are displayed in the attached figure (bottom panel; for comparison, last year's scores are shown in the top panel). This comparison serves several purposes: it gives a crude measure of the relative performance of the different implementations; it roughly illustrates changes from year to year; it indicates possible problems in particular model configurations. Of course, it is not a completely clean comparison. The integration areas are all different; there are some differences in the verification techniques; one of the models uses a different analysis system. Thus, the results must be interpreted cautiously. Nonetheless, the comparison is considered to be useful. Indeed, this year's results show up a surprising feature, discussed below.
The 1997 scores were discussed in Newsletter 27. We remark here only that the apparently superior performance of the SMHI model is believed to be due to differences in the thresholds allowed for observation usage. The scores for 1998 are broadly similar to those for last year. The relatively large errors in the 48 hour forecasts have been noted previously: the Hirlam model scores poorly for PMSL, better for other parameters. Of concern is the character of the error curves: they are approximately linear, or even of increasing slope during the later part of the period. This is contrary to the commonly observed flattening out of the rms curves as they asymptote to the climatological level. The figure shows that the scores for the Spanish implementation are exceptional in that they have this behaviour. Most noteworthy also is the significantly better performance of the INM model at greater forecast range, compared to all the others. The explanation proposed for the superior scores of the SMHI model in 1997 does not apply here: the Spanish scores were calculated using the reference verification package.
What could account for the singular character of the Spanish results? All the implementations except that of INM use a rotated latitude/longitude grid: the poles are displaced from their geographic locations in order to achieve greater uniformity of resolution. One consequence is that the `effective central latitude' is greater. This, combined with the more southern area used for the INM operations, could contribute to the discrepancy. However, we note that verification is against observations, not fields, so a question-mark remains. The issue is the subject of high-priority investigation.
Several operational reports in this Newsletter include verification results. They confirm that the Hirlam model performance is better for the near-surface parameters than for the sea-level pressure. See, for example, the Danish report (page 54) which states that the Hirlam 10 m wind forecasts are better than available alternatives, particularly in operationally important cases of extreme winds, and that this is especially true for the finer-mesh configurations of the model. This is reassuring, as it is well-known that raw rms scores do not consistently reflect the benefits associated with enhanced model resolution.
* * * * *