Automatic benchmarking of series homogenization packages

(Last update: 16/04/2015)

Many homogenization packages are issuing new versions, but their relative performance cannot be tested with a new edition of COST Action ES0601 due to the big amount of work involved. The alternative is to try them in automatic benchmarking experiments, although only those packages able of being run completely unattended can be compared in this way.

Three master synthetic networks were generated in the first place, constituted by 100 series of 60 years simulated monthly averages of maximum daily temperatures with characteristics taken from a central area of the Duero basin (Spain). The first station was filled with random values with means ranging from 7.9°C (January) to 29.9°C (July) and 1.5°C standard deviations. The rest of the series were filled adding white noise to the data of their nearest neighbor. Three different coefficients applied to the perturbing noise resulted in the three master networks TA20, TA40 and TA80, with decreasing degrees of inter-station correlation.

Then, 100 homogenization exercises were applied to each of the three networks, by randomly choosing 40 of the 100 series and applying inhomogeneities: a) to the first 5 series; b) to all of them. In all cases the first 10 series were kept complete, while between 50 and 70% of the data of the remaining 30 series were removed in order to simulate the existence of short lived series in the network. Four types of inhomogeneous problems were tested:

  1. Two shifts of 2°C at fixed locations were applied to the first three series, and only one 2°C shift to series fourth and fifth, but with random location. The last 10 years of the series were kept with their original values. Therefore, series 6 to 10 could act as complete homogeneous references, while series 11 to 40 had more than 50% missing data.
  2. As in the previous setting, but with a strong seasonal variation of the shifts.
  3. A random number of shifts were applied to all series, with random location and shift magnitude. Only the last 5 years were granted to keep their original values.
  4. As in the previous setting, but with a random seasonal variation of the shifts.

Several homogenization packages were used to homogenize these problem networks, and their resulting corrected series were compared with the original, computing the Root Mean Squared Error and trend differences. When packages could manage series with a high proportion of missing data, they were tried with and without them, to evaluate the effect of their use. The tested packages were:

Results show that most methods produced relatively good results in this exercise. Absolute homogenization is not advisable, as shown in the first two RHTestV4 results. Also noticeable are the bad results of this package when quantile adjustments cannot be applied properly without a suitable good homogeneous reference.

On the other hand, the benefit of using short series ('cl2', 'cl4', 'US2', 'US4') is only noticeable with low correlations. (However, the reconstruction of short series can be very useful for mapping, climate monitoring, etc).

Other details on the different performance of the tested methods can be derived from the following figure grid, where RMSE and trends can be compared with those of the inhomogeneous series 'Inh'. (Some outliers lay outside the plotting limits, which have been set constant for a better visual comparison):

Network correlation
InhomogeneitiesRoot mean square errors
InhomogeneitiesTrend errors

For full details on the methodology (although with less methods tested than in this web page), see:

Guijarro JA (2011): Influence of network density on homogenization performance. Seventh Seminar for Homogenization and Quality Control in Climatological Databases jointly organized with the Meeting of COST ES0601 (HOME) Action MC Meeting, Budapest, 24-27/October, WCDMP-No. 78, pp. 11-18.

Anybody willing to check the methodology used in this comparison can contact jguijarrop AT aemet.es to get the scripts and data needed to reproduce these results or to extend them to other problem settings.