Web site of the MULTITEST project
Project funded by the Spanish Ministry of Economy and
Competitiveness (CGL2014-52901-P), developed between 2015 and
2017.
Principal investigator: Manola Brunet1
Team participants: José A. Guijarro2, José A. López2, Enric Aguilar1 and Javier Sigró1.
External participants: Peter Domonkos and Victor Venema3.
1: Universitat Rovira i Virgili, Tarragona, Spain.
2: State Meteorological Agency (AEMET), Spain.
3: Meteorological Institute, University of Bonn, Germany.
(Page under construction: Some sections may lack their intended contents.)
(Last update: 02/01/2020)
Climatological series are affected by unwanted perturbations due station
relocations, changes in instrumentation, in observation practices or in the
environment. These perturbations must be removed (homogenization) before
analyzing the series to avoid misleading conclusions about climate
variability.
Many homogenization methods have been developed so far. The most relevant
were compared in the successful COST Action ES0601 (HOME: 2006-2011). However,
as homogenization methods, implemented in the form of software packages,
have been evolving since then, new intercomparisons are needed to assess their
performance, but implementing a new edition of that Action would be too
costly. Therefore, the only feasible alternative is to perform automatic comparisons, although only methods that can be run in this mode can be tested.
The MULTITEST project was an initiative of Peter Domonkos when he was working
at the Centre for Climate Change of the University of Tarragona (Spain). The
aim of the project was to update and improve the results of a preliminary
comparison exercise (Guijarro, 2011) by using better synthetic datasets of
monthly values of temperature and precipitation, and testing more
homogenization software packages over a variety of inhomogeneity
problems.
Guijarro, JA (2011): Influence of network density on homogenization
performance. Seventh Seminar for Homogenization and Quality Control in
Climatological Databases jointly organized with the Meeting of COST ES0601
(HOME) Action MC Meeting, Budapest, 24-27/October, WMO WCDMP-No. 78, pp.
11-18.
Summary
Synthetic homogeneous databases were generated, composed by 100 homogeneous
series with (mostly) 60 years of monthly values. Then, for every method,
database and experimental setting, runs of 100 tests were made by:
- Randomly sampling a subset of the series (true solution).
- Applying inhomogeneities to them (problem series).
- Homogenizing them by the tested method (results, with backward adjustment).
- Comparing the results with the true solutions, computing RMSE and
errors in trends, means and standard deviations.
Note that as these methods are applied in an automatic way, they are run
with default settings, and their results may not be as optimal as when
properly tuned to each problem network.
Synthetic master networks
Three master networks of mean monthly temperature, named Tm1, Tm2 and Tm3,
were generated following these steps:
- 100 station locations were distributed randomly over a 4 x 3° lon-lat area.
- Mean monthly homogenized temperatures from Valladolid (Duero basin,
Spain) were assigned to the first point, located by the center of the
area.
- The closest location to this first point was assigned the same series
plus white noise drawn from C*N(0, 1.5), and the same procedure was
applied to attribute data to the closest point to any already assigned
points, until all the network was filled with data.
- Three different coefficients C (0.18, 0.30 and 0.65) were used to
obtain the three master networks, with decreasing cross-correlation between
the simulated stations.
- Finally, series were biased to account for simulated elevation, a trend of 2°C/100yr was added, and their yearly cycle amplitude was varied +/-20%.
Other three master networks of monthly precipitation were built simulating
three different climates: Atlantic temperate (PEir), Mediterranean (PMca) and
Monsoonal (PInd). Real series from Ireland and Mallorca, and gridded series
from SW India, were respectively used to derive variograms, gamma coefficients
and frequencies of zeroes, which were used to compute their synthetic series by
means of the R package gstat, preserving the spatial correlation structure.
Tested homogenization packages
The homogenization packages to be tested were chosen among those that
participated in the COST Action ES0601, with the requirement that they
should be able to run in a completely automatic mode, implemented in bash
scripts on a Linux PC:
- Climatol 3.0 (Guijarro, 2016), with constant and variable
corrections.
- ACMANT 4.3 (Domonkos, 2015, 2017, 2019), versions for temperature (sinusoidal
and irregular seasonalities) and precipitation.
- MASH 3.03 (Szentimrey), without seasonal adjustment.
- RHtestsV4 (Wang and Feng, 2013), absolute and relative, with/without
quantile adjustment. (Average series were given as reference!)
- USHCN_v52d (Menne and Williams, 2012).
- HOMER 2.6 (Mestre et al., 2012), with two iteration
strategies.
A few implementation details:
- Climatol run natively.
- ACMANT run smoothly by means of wine (an API that allows Linux to
run programs compiled for Windows).
- MASH *.bat scripts could also be run with wine, but SAMPAR.BAT and
SAMEND.BAT gave errors of invalid file names when trying to run commands as
"copy m1. *.tr", and had to be rewritten in a bash Linux script.
- RHtestsV4 run natively, but it is not programmed to be automatically
applied to a network, because it is the user who must provide the reference
series for a relative homogenization. The mean series was generally provided as
reference in most scripts.
- USHCN_v52d must be compiled from the Fortran sources, which is not trivial,
since specific versions of the compiler and accompanying libraries are needed.
It could be compiled several years ago in a PC which is no longer available. A
newer version could finally be compiled, but could not be tested because of
run-time errors.
- HOMER is designed to be run interactively. The automation was first tried
by redirecting the input from a list of appropriate answers, but it only worked
when these answers were supplied by means of the utility expect.
- As RHtestsV4 returns an error condition when no inhomogeneity is found on the tested series, the problem series is used as the returned solution. This strategy was generalized to all the methods. Even so, HOMER sometimes stopped with an irrecoverable error, and therefore their results may have been computed from an incomplete set of solutions.
(Other packages can be added if developers or advanced
users provide scripts that read input problem series and return their
homogenized solutions.)
Domonkos P (2015): Homogenization of precipitation time series with ACMANT. Theor. Appl. Climatol., 122:303-314.
Guijarro, JA (2016). https://cran.r-project.org/web/packages/climatol/index.html
Mestre O, Domonkos P, Guijarro J, Aguilar E (2012). Not anymore available at http://www.homogenisation.org/Documents/HOMER.R
Szentimrey T. https://www.met.hu/en/omsz/rendezvenyek/homogenization_and_interpolation/software (Accessed 29/9/2011)
Wang XL, Feng Y (2013). https://etccdi.pacificclimate.org/software.shtml
The results of the tests are evaluated through the comparison of the solutions provided by the tested packages with the true original homogeneous series. Four metrics have been computed from these comparisons:
- Root Mean Squared Error (RMSE) serves to measure how far are the returned homogenized series form the original.
- Errors of the trends in the returned solutions when compared to the
original series is another important metric to check the impact of
homogenization in climate change detection assessments.
- Errors in the means of returned series can be important when producing
climatic maps from them.
- Errors in the standard deviations would show the impact in studies of
variability and extreme values, but they would be more relevant in the case
of daily series than for the monthly series simulated here.
(Results for metrics 3 and 4 will be added soon.)
It is important to note that all these metrics have been computed only on
series whose problem was inhomogeneous, but new calculations will be done in
the future involving all series, to account for cases in which the packages may
have corrected false inhomogeneities.
Results are presented in separate web pages, grouped by types of
experiments, by means of box-and-whisker plots showing the spread of the
solutions provided by every package:
Precipitation
Temperature:
| First five experiments
| Several experiments with Tm2
(Summary tables of the results will be presented here in a near future.)
(Master networks and scripts needed to run these tests will be
available here in a near future for the sake of transparency and
reproducibility. A work of tidying up and ordering all that stuff is needed to
avoid confusion to any potential user.)