MULTITEST project

Web site of the MULTITEST project

Project funded by the Spanish Ministry of Economy and Competitiveness (CGL2014-52901-P), developed between 2015 and 2017.

Principal investigator: Manola Brunet¹
Team participants: José A. Guijarro², José A. López², Enric Aguilar¹ and Javier Sigró¹.
External participants: Peter Domonkos and Victor Venema³.

1: Universitat Rovira i Virgili, Tarragona, Spain.
2: State Meteorological Agency (AEMET), Spain.
3: Meteorological Institute, University of Bonn, Germany.

(Page under construction: Some sections may lack their intended contents.)

(Last update: 02/01/2020)

Presentation

Methodology

Homogenization results

Summary

Scripts

Presentation

Climatological series are affected by unwanted perturbations due station relocations, changes in instrumentation, in observation practices or in the environment. These perturbations must be removed (homogenization) before analyzing the series to avoid misleading conclusions about climate variability.

Many homogenization methods have been developed so far. The most relevant were compared in the successful COST Action ES0601 (HOME: 2006-2011). However, as homogenization methods, implemented in the form of software packages, have been evolving since then, new intercomparisons are needed to assess their performance, but implementing a new edition of that Action would be too costly. Therefore, the only feasible alternative is to perform automatic comparisons, although only methods that can be run in this mode can be tested.

The MULTITEST project was an initiative of Peter Domonkos when he was working at the Centre for Climate Change of the University of Tarragona (Spain). The aim of the project was to update and improve the results of a preliminary comparison exercise (Guijarro, 2011) by using better synthetic datasets of monthly values of temperature and precipitation, and testing more homogenization software packages over a variety of inhomogeneity problems.

Guijarro, JA (2011): Influence of network density on homogenization performance. Seventh Seminar for Homogenization and Quality Control in Climatological Databases jointly organized with the Meeting of COST ES0601 (HOME) Action MC Meeting, Budapest, 24-27/October, WMO WCDMP-No. 78, pp. 11-18.

Methodology

Summary

Synthetic homogeneous databases were generated, composed by 100 homogeneous series with (mostly) 60 years of monthly values. Then, for every method, database and experimental setting, runs of 100 tests were made by:

Randomly sampling a subset of the series (true solution).
Applying inhomogeneities to them (problem series).
Homogenizing them by the tested method (results, with backward adjustment).
Comparing the results with the true solutions, computing RMSE and errors in trends, means and standard deviations.

Note that as these methods are applied in an automatic way, they are run with default settings, and their results may not be as optimal as when properly tuned to each problem network.

Synthetic master networks

Three master networks of mean monthly temperature, named Tm1, Tm2 and Tm3, were generated following these steps:

100 station locations were distributed randomly over a 4 x 3° lon-lat area.
Mean monthly homogenized temperatures from Valladolid (Duero basin, Spain) were assigned to the first point, located by the center of the area.
The closest location to this first point was assigned the same series plus white noise drawn from C*N(0, 1.5), and the same procedure was applied to attribute data to the closest point to any already assigned points, until all the network was filled with data.
Three different coefficients C (0.18, 0.30 and 0.65) were used to obtain the three master networks, with decreasing cross-correlation between the simulated stations.
Finally, series were biased to account for simulated elevation, a trend of 2°C/100yr was added, and their yearly cycle amplitude was varied +/-20%.

Other three master networks of monthly precipitation were built simulating three different climates: Atlantic temperate (PEir), Mediterranean (PMca) and Monsoonal (PInd). Real series from Ireland and Mallorca, and gridded series from SW India, were respectively used to derive variograms, gamma coefficients and frequencies of zeroes, which were used to compute their synthetic series by means of the R package gstat, preserving the spatial correlation structure.

Tested homogenization packages

The homogenization packages to be tested were chosen among those that participated in the COST Action ES0601, with the requirement that they should be able to run in a completely automatic mode, implemented in bash scripts on a Linux PC:

Climatol 3.0 (Guijarro, 2016), with constant and variable corrections.
ACMANT 4.3 (Domonkos, 2015, 2017, 2019), versions for temperature (sinusoidal and irregular seasonalities) and precipitation.
MASH 3.03 (Szentimrey), without seasonal adjustment.
RHtestsV4 (Wang and Feng, 2013), absolute and relative, with/without quantile adjustment. (Average series were given as reference!)
USHCN_v52d (Menne and Williams, 2012).
HOMER 2.6 (Mestre et al., 2012), with two iteration strategies.

A few implementation details:

Climatol run natively.
ACMANT run smoothly by means of wine (an API that allows Linux to run programs compiled for Windows).
MASH *.bat scripts could also be run with wine, but SAMPAR.BAT and SAMEND.BAT gave errors of invalid file names when trying to run commands as "copy m1. *.tr", and had to be rewritten in a bash Linux script.
RHtestsV4 run natively, but it is not programmed to be automatically applied to a network, because it is the user who must provide the reference series for a relative homogenization. The mean series was generally provided as reference in most scripts.
USHCN_v52d must be compiled from the Fortran sources, which is not trivial, since specific versions of the compiler and accompanying libraries are needed. It could be compiled several years ago in a PC which is no longer available. A newer version could finally be compiled, but could not be tested because of run-time errors.
HOMER is designed to be run interactively. The automation was first tried by redirecting the input from a list of appropriate answers, but it only worked when these answers were supplied by means of the utility expect.
As RHtestsV4 returns an error condition when no inhomogeneity is found on the tested series, the problem series is used as the returned solution. This strategy was generalized to all the methods. Even so, HOMER sometimes stopped with an irrecoverable error, and therefore their results may have been computed from an incomplete set of solutions.

(Other packages can be added if developers or advanced users provide scripts that read input problem series and return their homogenized solutions.)

Domonkos P (2015): Homogenization of precipitation time series with ACMANT. Theor. Appl. Climatol., 122:303-314.
Guijarro, JA (2016). https://cran.r-project.org/web/packages/climatol/index.html
Mestre O, Domonkos P, Guijarro J, Aguilar E (2012). Not anymore available at http://www.homogenisation.org/Documents/HOMER.R
Szentimrey T. https://www.met.hu/en/omsz/rendezvenyek/homogenization_and_interpolation/software (Accessed 29/9/2011)
Wang XL, Feng Y (2013). https://etccdi.pacificclimate.org/software.shtml

Homogenization results

The results of the tests are evaluated through the comparison of the solutions provided by the tested packages with the true original homogeneous series. Four metrics have been computed from these comparisons:

Root Mean Squared Error (RMSE) serves to measure how far are the returned homogenized series form the original.
Errors of the trends in the returned solutions when compared to the original series is another important metric to check the impact of homogenization in climate change detection assessments.
Errors in the means of returned series can be important when producing climatic maps from them.
Errors in the standard deviations would show the impact in studies of variability and extreme values, but they would be more relevant in the case of daily series than for the monthly series simulated here.

(Results for metrics 3 and 4 will be added soon.)

It is important to note that all these metrics have been computed only on series whose problem was inhomogeneous, but new calculations will be done in the future involving all series, to account for cases in which the packages may have corrected false inhomogeneities.

Results are presented in separate web pages, grouped by types of experiments, by means of box-and-whisker plots showing the spread of the solutions provided by every package:

Precipitation

Temperature: | First five experiments | Several experiments with Tm2

Summary

(Summary tables of the results will be presented here in a near future.)

Scripts

(Master networks and scripts needed to run these tests will be available here in a near future for the sake of transparency and reproducibility. A work of tidying up and ordering all that stuff is needed to avoid confusion to any potential user.)