Kevin Cowtan and Robert Way
Additionally, the Hadley dataset is the only version of the temperature record which has not already been subject to smoothing or interpolation. As a result it is the only dataset from which we could obtain realistic estimates of the spatial variability in the temperature data.
Check it for yourself
Run the makebias.csh and makehad4.csh scripts. Now, starting from the hybrid reconstruction (had4_1.0.dat), use masklat.py to remove the top and bottom 5 degrees of data. Use temp.py to calculate a temperature series. Finally use trend.py to determine the trend on the period 1997-2012. How does the trend compare to the full hybrid reconstruction?
Use the program nullfill.py to fill the polar holes with the mean value from the rest of the map, which does not affect the mean. Are the resulting maps realistic?
While RSS omit the Antarctic in their satellite reconstructions, both UAH and NOAA include it. While there are clearly difficulties with the data, our results support this decision.
Check it for yourself
Run the makebias.csh, makehad4.csh and makecross1.csh scripts. Examine the files rmsd_1_krig.dat and rmsd_1_1.0.dat. The Antarctic stations are in the bottom few rows. Which method has more skill in reconstructing the omitted stations: Kriging or hybrid (s=1.0)?
Repeat this test for the land-only reconstruction in the had4-l directory. After reading our 30/10/2013 update, can you deduce what factors affect whether the satellite data helps or not?
The problem with the Arctic ice pack is a lack of permanent weather stations against which to validate the results. The only source of in-situ observations are the IABP data to 1998 (Rigor et al, 2000). Our reconstruction reproduces the IABP data very well apart from 2 excursions around 1997/1998 and 1991/2. The month-on-month variation in our data shows good agreement with 3 weather models, which tend to support our reconstruction over the problem periods.
The 16 year trends are more difficult, because the IABP data does not cover this period. The weather models show somewhat different trends, suggesting that there are problems in modeling the central Arctic. Our reconstruction is more conservative than any of the 3 models over this period, and is closest to the ERA-interim model (Dee et al, 2011. See also papers by Screen et al).
The Arctic is sufficiently small that there is no significant difference between the kriging and hybrid reconstructions over this region.
Note that the in-situ temperature record aims to measure surface air temperature (and hence model-data comparisons are against T2m). The reason sea surface temperatures are used is that they provide a better estimate of air temperature over the oceans than Marine air temperatures, which have serious homogenization problems. However in the case of sea ice, the sea surface temperature is no longer a good measure of air temperature because the atmosphere is insulated from the ocean by a layer of snow and ice. This is confirmed by the IABP data and weather models.
The largest missing regions are either land or sea ice, and the nearest observations are from land stations. As a result reconstructing from the blended data provides a good approximation to reconstructing the domains separately. However reconstruction by domain has a number of benefits and we intend to adopt this approach in the long run.
Check it for yourself
Choose a cell in the sea ice in either the IABP or NCEP/NCAR data. Examine the variance of the cell over a fairly stable period, e.g. 1981-90. Now repeat the test for a land cell, either in the same data or in the CRUTEM4 data (had4-l directory). If possible examine both snow-covered and snow-free cells. Finally examine an ocean cell in the HadSST3 data (had4-o directory). What do the variances tell you?
In the context of the paper, we use atmospheric reanalysis products (NCEP and ECMWF) to help assess the degree of bias and latitudinal patterns in coverage bias within the HadCRUT4 dataset. Atmospheric reanalysis products are also used for comparison with our final results at both the global and regional scales including during comparisons with the International Arctic Buoy Program (IABP) dataset discussed by Rigor et al (2000).
In our results we show that atmospheric some reanalysis products perform reasonably well in determining surface air temperature (SAT) in the Arctic and Antarctic similar to results shown by Screen and Simmonds (2011), Screen et al (2012) and Screen and Simmonds (2012). In particular, the ERA-Interim atmospheric reanalysis dataset (Dee et al. 2011) provides the most realistic results in both Polar Regions and matches the regional patterns found in our study well. At mid-latitudes researchers should be cautious when using reanalysis products for assessing climate trends in air temperature.
There are two kinds of uncertainties to consider: uncertainty in the temperature estimates for individual cells, and uncertainty in the global mean surface temperature. The latter is estimated by taking the reanalysis data, reducing coverage to match HadCRUT4, reconstructing a global map from the reduced map, and comparing the mean temperature to the original values. This does not require that the reanalysis data is correct, only that it is physically plausible.
Individual cell uncertainties do not figure in our work, but may be determined as an additional step in the kriging calculation. The program 'krig1v-var.py' does this calculation. Note that this code is only minimally tested and the resulting uncertainties may be inflated by a factor of √2 due to a missing factor of 2 in the (semi-)variogram calculation.
Our work primarily addresses the impact of bias on recent temperature trends. Our results highlight the dangers of drawing conclusions from short term trends. This type of argument has dominated the public discourse, but is in our view a misleading approach to evaluating climate science.
The question of whether climate models predict the observations is a separate issue. This is a genuine scientific question which is not addressed by our work.
In the Cowtan and Way (2013) reconstruction the years 1998 and 2010 show the most change relative to the original HadCRUT4 dataset. The issues associated with reconstructing temperature patterns for these two years provide a great opportunity to show the impacts of coverage bias. Although 1998 and 2010 were both years that were characterized by warm air temperatures partly due to El Ninos they differed significantly in their synoptic patterns. In 1998 the very warm temperatures were at least partly caused by one of the most powerful El Ninos in recorded history (McPhaden, 1999). By contrast 2010 was characterized by a relatively moderate El Nino year but also the most negative Arctic Oscillation values in over 50 years (Cohen et al, 2010). The pattern of temperature change associated with El Nino shows that the response to ENSO events is strongest in the tropics and mid-to-low latitudes where observational data is the most complete (Figure 1). By contrast, the pattern of temperature change associated with the Arctic Oscillation is for the most part greatest in the high latitude regions of the Northern Hemisphere where observational coverage is fairly poor (Figure 2).
Considering the distinct synoptic patterns evident for both 1998 and 2010, we can see that temperatures in 1998 were warmer than 2010 for regions with the best observational coverage but were significantly lower than those in 2010 for regions with poor observational coverage such as the eastern Canadian Arctic (Figure 3). As a result of the coverage bias in the original HadCRUT4 the year 1998 was artificially warmer than it actually was and the year 2010 was cooler than in reality. This example shows clearly why improved observational coverage is important for understanding recent and past climate.
Check it for yourself
Open the following .KMZ file in Google Earth showing the difference in temperature between 2010 and 1998 using ERA-Interim data. Consider the spatial distribution of these differences relative to this .KMZ file showing the typical extent of HadCRUT4 coverage over the period 1979-2012.
The surface data comes from thousands of simple and standardised thermometers, and so has good temporal stability but uneven spatial coverage. The satellite data comes from a succession of single complex instruments, so while it has good geographical coverage, maintaining temporal stability is very challenging and different groups get different answers.
We use the spatial information from the satellite data to address the spatial incompleteness of the surface data. This could increase or decrease the trend in the surface data. The trend in the satellite data plays no part: This was a design decision of the method on the basis of the temporal stability issues. Adding an arbitrary time varying signal to the satellite data would not affect our results.
The difference between our data and HadCRUT4 provides an estimate of coverage bias. Changes in coverage bias are controlled by two factors: Changes in the temperature contrast between the observed and unobserved regions, and changes in coverage. The two combine in a way analogous to the product rule in calculus.
There is a possible coverage issue: The most northerly station in the CRU data for the eastern hemisphere is POLARGMO. Data from this station are missing from Jan 2001 to Oct 2004 (SANNIKOVA STRAIT also has a gap in 2004). Without this station the central Arctic has rather less complete coverage, and as a result reverts naturally towards the global mean. Could this explain the jump? In this case no, the effect of removing this station is only about 0.01C.
The other alternative is temperature contrast. The MERRA weather model reanalysis does not assimilate weather station temperatures and so can be used as an independent test. The MERRA annual temperatures for our central Arctic test region are shown in the following figure, with 2004 highlighted in blue.
The MERRA temperature estimates for the central Arctic suggest that 2004 is over a degree cooler than 2005, and the jump from December 2004 to January 2005 is over 3°C. As a result the impact of leaving the Arctic out of the temperature data changes dramatically between the two months, creating a noticeable change in the HadCRUT4 bias. The step difference between our reconstruction and HadCRUT4 is an expected result of omitting the Arctic under these circumstances, and arises from coverage bias in the HadCRUT4 reconstruction.
The unobserved regions in the Antarctic are much larger and so also play a part even though the changes in the Antarctic are less extreme.
This work was inspired back in 2011 by a desire to understand the divergence between the 3 main versions of the in situ temperature record. On inspecting the data it rapidly became obvious that the principal differences were due to coverage. Investigating the literature revealed that the issue was already well known.
The fact that trends starting around 1998 are unusually biased was unexpected. The discovery that the sea surface temperature bias was most significant over the same period was even more surprising. In retrospect however the connection is clear: The existence of these two biases impacting trends starting around 1998 have contributed to the identification of this period as a 'pause' in global warming.
The problem is that evaluating the claims of a paper requires a similar level of expertise to writing it in the first place, so we often need to use other means to evaluate a paper. Whether it has passed peer-review is a useful one since, when it works, it means that other experts consider the work to be competently conducted. However experts are fallible, and a piece of research can be both competent and wrong for unexpected reasons.
The best test of the value of a piece of scientific work is to evaluate the response to the work over years or decades, with citations being a crude metric. For a more immediate evaluation, probably the best that can be achieved is to consult a range of experts with a good publication record in the subject of the work.