- Research article
- Open Access
- Published:

# Characterising prediction error as a function of scale in spatial surfaces of tree productivity

*New Zealand Journal of Forestry Science*
**volume 47**, Article number: 19 (2017)

## Abstract

### Background

Two indices, the 300 Index and Site Index, are commonly used to quantify productivity of *Pinus radiata* D.Don within New Zealand. Although maps of these indices exist, availability of new data and modifications to underlying models makes a refit of these prediction surfaces desirable. Prediction errors of such surfaces have only been reported at a plot-level scale, but their application is invariably at a larger scale where prediction accuracy should be better. The objectives of this study were to: (i) develop updated predictive surfaces for the 300 Index and Site Index; and (ii) characterise the relationship between prediction error and spatial scale for both surfaces.

### Methods

Models were developed using a dataset of 4108 permanent sample plots from throughout New Zealand. Productivity indices were estimated from plot measurements and environmental variables extracted for each plot. Data were randomly split into fitting and validation datasets and surfaces developed from the fitting dataset for the 300 Index and Site Index using partial least squares regression, ordinary kriging and regression kriging. Prediction accuracy across a range of scales from 0.2 to 200 km was evaluated using the validation dataset.

### Results

Regression kriging was found to be the most accurate method for describing spatial variation in the 300 Index and Site Index across New Zealand. Examination of changes in prediction error with spatial scale demonstrated a gradual decline in error from the plot level with increasing scale.

### Conclusions

This study provides accurate maps of both the 300 Index and Site Index across New Zealand. Analysis of the effects of scale on prediction accuracy indicates that 95% confidence intervals of predictions for the 300 Index based on these maps averaged over an area of about 700 ha are half those of plot-level predictions and halve again at a scale of about 20,000 ha. For the Site Index, the improvement in precision with increasing scale is more gradual with 95% confidence intervals halving at a scale of about 20,000 ha and halving again at a scale of about 250,000 ha.

## Background

The influence of climatic variables on tree growth and development has been well documented throughout both the growth modelling and physiological literature. It is well recognised that factors such as air temperature, water balance and soil nutrition regulate growth and account for variation in productivity at a given age between sites (Sampson et al. 2006; Bollmann et al. 1986; Jones et al. 1991; Battaglia et al. 1996; Duchesne and Houle 2011; Kirschbaum 1999; Battaglia et al. 2004).

There are various ways in which environmental factors can be integrated into forest growth models. Many process-based models have been developed, ranging in complexity from the simple light use efficiency approach (Monteith and Moss 1977) through to more sophisticated models such as 3-PG (Landsberg and Waring 1997) and models that link carbon, water and nitrogen flows in the trees and soil (Kirschbaum 1999; Battaglia et al. 2004). However, with a few notable exceptions (Landsberg and Waring 1997), these models are seldom used as practical tools in forest management as they include too many uncertainties and require values for numerous parameters that can be difficult to obtain (Makela et al. 2000).

Empirical models of tree growth are far more widely used within the forest industry. In these models, tree productivity is determined from stand age using non-linear functional forms. Variation in productivity between stands is typically accounted for by standardised measurements of productivity at a given age (e.g. Site Index) that can be used to adjust the trajectory of predictions through time and the asymptote.

Standardised measurements of productivity such as the Site Index have been shown to be strongly related to environmental factors. The recent proliferation of environmental surfaces has seen the development of national-level spatial surfaces describing Site Index and volume productivity indices for a range of plantation species (Watt et al. 2009; Palmer et al. 2009b; Palmer et al. 2012). These surfaces are of considerable use to the forest industry as they can be used to parameterise empirical growth models and provide insight into how productivity may vary at a relatively fine scale.

In New Zealand, *Pinus radiata* D.Don is the most widely planted commercial crop covering an estimated 1.6 million hectares and comprising 91% of the entire national plantation estate (New Zealand Forest Owners Association 2010). There are two indices commonly used to quantify productivity of *P. radiata* within New Zealand. These are the 300 Index, defined as the stem volume mean annual increment (MAI) at age 30 years with a reference regime of 300 stems ha^{−1} (Kimberley et al. 2005), and Site Index, defined as the mean top height^{Footnote 1} at age 20 years (Goulding 2005). Although surfaces have been developed for these two important indices of *P. radiata* (Palmer et al. 2009b), more data have since become available justifying a refit of these models. Furthermore, there have been modifications to the procedures used to estimate the productivity indices. In particular, the 300 Index growth model (which is used to derive the 300 Index from plot measurements) has been updated, with the latest version predicting greater volume growth beyond age 25 years than earlier versions. This means that estimates of the 300 Index produced from early- or mid-rotation measurements have increased uniformly by about 1 m^{3} ha^{−1} year^{−1}. This change necessitates a refit of the prediction surface as it is important that the surface and the models used to apply its predictions remain compatible.

Although productivity surfaces are often used across a range of scales in forest applications, little research has been conducted into how error associated with these surfaces varies with scale. Errors are most often reported at the plot level, which are typically only a fraction of a hectare in size. However, the scale of application is more often the stand or forest level. Stands are typically in the tens of hectares, while forests are in the hundreds or thousands of hectares. It would be expected that at these greater scales, prediction errors should be substantially smaller than at the plot scale. Quantifying the relationship between error and prediction scale for productivity surfaces would represent a considerable advance as this would allow prediction error to be more accurately defined.

Similar issues arise in forest inventory where there is often a requirement to obtain predictions for subdomains or areas of interest (AOIs) within the area covered by an inventory. Modern forest inventories often use model-based approaches to predict forest parameters such as stem volume, basal area or stand density, using a combination of ground-based plot measurements and spatially intense remotely sensed ancillary data. Methods of analysis used in such inventories can include regression (Dungan 1998; McRoberts 2006) and the *k*-nearest neighbour (*k*-nn) method (Franco-Lopez et al. 2001; Tomppo and Halme 2004). Quantifying the uncertainty of predictions obtained using these methods for AOIs is often complex. Variances of estimates for individual pixels are not suited to estimates at a larger scale (Kim and Tomppo 2006). In principle, an estimate of the variance of the mean of a response variable across multiple pixels can be achieved by fitting an empirical model to the spatial variogram and incorporating its parameter estimates into the model-based variance. This approach was used, for example, by McRoberts et al. (2007) for estimating the variance of predictions for AOIs using the *k*-nn method.

In the current study, productivity maps of the Site Index and the 300 Index for *P. radiata* in New Zealand were developed using various analysis methods including regression, ordinary kriging and regression kriging. The surfaces were developed from productivity indices obtained from an extensive dataset of permanent sample plot measurements along with ancillary environmental data from spatial surfaces. To establish variances of estimates at varying spatial scales, a method using simple empirical estimates of the prediction error variance was applied using a validation dataset set aside to test and validate the analysis methods. The objectives of this study were to: (i) develop updated predictive surfaces for the Site Index and the 300 Index; and (ii) characterise the relationship between error and spatial scale for both of these surfaces.

## Methods

### Permanent sampling plot data and preliminary screening

Stand-level data were extracted from the New Zealand Forest Research Institute Ltd. Permanent Sample Plot (PSP) system (Pillar and Dunlop 1990). These data were examined for sites that could adversely influence the integrity of the dataset. Exclusions included Nelder spacing trials, oversowing, disturbance (forest floor removal) and fertiliser (phosphorus (P), nitrogen (N) and potassium (K) trials. For these exclusions, trial control plots were identified and retained. Other exclusions included data from stands planted prior to 1975 and stands less than 7 years in age. These latter two groups were excluded as data from young trees are inherently unreliable and a preliminary screening of the PSP data found that stands established post 1975 have 300 Index values ~ 25% higher than stands established during the 1930s. This result was also noted by Kimberley et al. (2005) who attributed the increase in the 300 Index in more recently planted stands to improvements in genetics and management. This screening process left 4108 plots available for analysis.

Following Kimberley et al. (2005), 300 Index and Site Index values were calculated from these data. To calculate the Site Index, a national height/age model (an equation for predicting height for any age and Site Index) was used. This model uses the Chapman-Richards (Richards 1959) equation using the Site Index as a local parameter and with both slope and shape parameters expressed as functions of this local parameter using an early version of the height/age model described by van der Colff and Kimberley (2013) of the same model form although with slightly different parameter values. By inverting the equation, it is possible to obtain the Site Index as a function of age and mean top height. In our study, the mean top height measurement closest to age 20 years was used for each permanent sample plot. This means that for plots with measurements made at precisely 20 years of age, the Site Index is simply the measured mean top height, while for plots not measured at precisely 20 years of age, the height/age model extrapolates backwards or forwards in time to 20 years from the nearest height measurement.

Estimation of the 300 Index, which is a measure of stem volume productivity, is more complex because, unlike height, stem volume is strongly influenced by stocking and, to a lesser extent, thinning and pruning history. To calculate the 300 Index, plot measurements consisting of basal area, mean top height and stocking at a known age, along with stand history information (initial stocking, timing and extent of thinnings, and timing and height of prunings) are required. The 300 Index estimation procedure utilises the 300 Index model, an empirical stand-level basal area growth model that expresses basal area as a function of age, stocking, the Site Index and the 300 Index, effectively a local site productivity parameter (Kimberley et al. 2005). The model accounts for the effects of pruning and thinning using age-shift adjustments. The model is structured so that for stands using the standard ‘300 Index’ regime (pruned to 6 m height and thinned at time of final pruning so that stand density at age 30 years is 300 stems ha^{−1}), the stem volume MAI equals the 300 Index parameter. Therefore, the 300 Index is as an index of stem volume productivity, defined as the volume MAI at age 30 years for this standard regime. Because the model is sensitive to departures from this standard regime (e.g. different stand densities and different intensities and timing of thinning and pruning regimes), and can also adjust for the stand age, it can be used to predict the index for any plot measurement. To do this, an iterative procedure is used to determine the 300 Index parameter value compatible with the plot measurement and management history associated with the plot (Kimberley et al. 2005).

### Data extraction and pre-processing

Where more than one plot estimate of the Site Index and 300 Index had identical nominal location (easting and northing), which sometimes occurred for field trial data, the estimates were averaged so they aligned with the independent variables used in the modelling of forest productivity. Following exclusions, and this averaging, there were 3413 independent measurements of the Site Index and 300 Index available for modelling distributed across New Zealand (Fig. 1).

From the co-ordinates of each of these measurements, data were extracted from biophysical GIS surfaces that included monthly and annual climate data (Mitchell 1991; Leathwick et al. 2002), fundamental soil layers and land resource information (Newsome et al. 2000), vegetative cover (Newsome 1987), biophysical surfaces (Leathwick et al. 2003), N and P foliar nutrition (Hunter et al. 1991), and primary and secondary terrain attributes (Palmer et al. 2009a). A spatial soil water balance model (Palmer et al. 2009c) was used to determine mean annual and seasonal root-zone water storage for all PSP locations. The fractional available root-zone water storage and the maximum available root-zone water storage were then determined from these data.

The dataset was randomly split into fitting (*n* = 2713) and validation (*n* = 700) datasets. The validation dataset was used to validate the models of the 300 Index and Site Index developed using the fitting dataset.

### Data analysis

All analyses were undertaken using SAS version 9.3 (SAS Institute Inc. 2008). Three different methods were used to develop spatial surfaces of the 300 Index and Site Index.

#### Ordinary kriging

Ordinary kriging (OK) (Cressie 1990) surfaces were fitted for the 300 Index and Site Index. Using the fitting dataset, experimental semivariograms were created for the 300 Index and Site Index using the SAS procedure VARIOGRAM and exponential semivariogram models fitted using the SAS procedure NLIN. These semivariogram models were then used to fit kriged surfaces for the 300 Index and Site Index using the SAS procedure KRIG2D.

#### Partial least squares regression

The partial least squares (PLS) regression approach used by (Palmer et al. 2009b) was used in the current study to predict the 300 Index and Site Index from the environmental variables. The PLS procedure originally developed by (Wold 1966) provides more stable predictions than ordinary least squares regression when the predictors are highly correlated. It extracts successive linear combinations of the predictors, called PLS factors, which explain both response and predictor variation. The same environmental variables selected by (Palmer et al. 2009b) were used in the current study. The 300 Index model used 15 continuous variables and 12 categorical variables while the Site Index model used 13 continuous variables and 6 categorical variables. Both models used four PLS factors. The SAS PLS procedure was used to fit these models.

#### Regression kriging

Regression kriging (RK) (Odeh et al. 1995) was performed by applying ordinary kriging to the residuals of the PLS regression models. Regression kriging predictions consist of the resultant kriged surface added to the PLS regression predictions. Regression kriging was also tested using simpler multiple linear regression models with limited numbers of environmental variables. However, because these models performed less well than the PLS regression kriging models when tested against the validation dataset, this approach was not taken any further in this study.

#### Effects of spatial scale on model accuracy

The root-mean-square error (RMSE) calculated using the validation dataset provides a measure of the accuracy of predictions of the Site Index or 300 Index for an individual measurement plot, which is typically only a fraction of a hectare in size. However, scale of application will invariably be much larger than that of a plot, with prediction typically being required for a forest stand in the tens of hectares or for a forest in the hundreds or thousands of hectares. At these larger scales, the RMSE is likely to provide a too conservative measure of model accuracy as it would be expected that prediction accuracy should improve with increasing scale.

To investigate prediction accuracy across a range of scales using the *N* = 700 observations in the validation dataset, the following procedure was used. Square grids of increasing scale between 0.7 and 200 km laid were overlaid onto the dataset, and for each grid, a one-way random-effects analysis of variance was performed, providing estimates of the variance components for between grid-square \( {\widehat{\sigma}}_b^2 \) and within \( {\widehat{\sigma}}_w^2 \) grid-square variation. The calculation of these variances followed standard procedures for an unbalanced one-way random-effects model (e.g. Searle 1971). Suppose a grid contains *k* squares, with the *i*th square containing *n*
_{
i
} observations with \( {\sum}_{i=1}^k{n}_i=N \). For the *j*th observation in the *i*th square, there is an observed value *y*
_{
ij
} and a predicted value \( {\widehat{y}}_{ij} \) of the productivity index (300 Index or Site Index) with the error of prediction being \( {e}_{ij}={y}_{ij}-{\hat{y}}_{ij} \). The within-square variance was calculated using

while the between-square variance was calculated using

These calculations were carried out using the SAS NESTED procedure. To obtain stable estimates, this procedure was repeated 1000 times for each scale using random grid starting locations and the variance components averaged.

The variance component estimates can be used to approximate the error in predictions from the surface averaged over a grid square containing *n* plot-sized cells. If it is assumed that the plots in the validation data are random samples of the plot-sized cells within each square, the estimated variance of the mean error of any one square is \( {\widehat{\sigma}}_b^2+{\widehat{\sigma}}_w^2/n \). Because *n* is large for all scales of interest, the variance of the mean error of a square can be approximated by \( {\widehat{\sigma}}_b^2 \). For example, for a 10-ha stand, *n* is 250 assuming a plot area of 0.04 ha, and for larger scales, *n* is even larger. Therefore, the quantity \( \sqrt{{\mathrm{Bias}}^2+{\widehat{\sigma}}_b^2} \), where Bias is the average prediction bias for the validation dataset, can be interpreted as a measure of prediction accuracy at the relevant scale equivalent to the RMSE used for plot-level predictions. This measure of prediction accuracy was calculated at each scale for all three prediction methods for both the 300 Index and Site Index.

## Results

### Model comparison

Using the validation dataset, the environmental variables used in the PLS regression accounted for 56 and 63% of the respective variance in the 300 Index and Site Index (Table 1). However, OK provided a considerable improvement in model fit over PLS, accounting for 69 and 78% of the variance in the 300 Index and Site Index, respectively. Finally, the precision of RK exceeded that of both PLS and OK, accounting for 71 and 82% of the variance in the 300 Index and Site Index, respectively. Maps produced using these RK models can therefore be considered to provide the most reliable predictions of *P. radiata* productivity in New Zealand yet produced. These maps are shown in Fig. 2.

Like Palmer et al. (2009b), exponential semivariogram models were used in the OK and RK analyses. The coefficients of the OK models show that there was strong spatial autocorrelation for both the 300 Index and Site Index extending over tens of kilometres with relatively small nuggets and long ranges (Table 2). This explains why OK works well for both variables. The coefficients of the RK models indicate that while the regression model removes much of the large-scale variation, there is significant spatial autocorrelation remaining in the residuals extending for several kilometres (Table 2). By accounting for this, RK provided the best overall performance.

The environmental variables utilised in the PLS and RK models demonstrate the importance of air temperature and annual water deficit as key determinants of both productivity measures. Both the 300 Index and Site Index decreased with increasing number of frost days during summer. There was also a positive relationship between the Site Index and maximum temperature during summer. Categorical variables included in the models are likely to represent surrogates for site fertility. Vegetation cover in 1987 was also an important determinant of productivity for the 300 Index providing a surrogate for previous land use and the cumulative effects of fertilising. Sites that were in pasture in 1987 were more productive than sites already established in plantation forest.

### Effect of scale on prediction error

Actual values of the 300 Index plotted against their RK predictions for the validation dataset showed a spread of points around a linear relationship (Fig. 3a). This relationship explained 71% of the variation in the 300 Index, while the prediction accuracy of each observation (RMSE—the average deviation about the 1:1 line) was 3.29 m^{3} ha^{−1} year^{−1} (Table 1). However, when the validation data was averaged by New Zealand local authority region, the relationship between actual and predicted values tightened considerably (Fig. 3b). For these regional means, 99% of variance was explained by the relationship and the RMSE reduced to only 0.53 m^{3} ha^{−1} year^{−1}. Similar results held for the Site Index where the plot-level RMSE of 2.02 m reduced to 0.39 m for regional means. This demonstrates that prediction accuracy is greatly dependent on the scale of prediction, with accuracy of an individual plot being far poorer than accuracy at a regional scale. In fact, the true accuracy of regional-scale predictions would be even better than these results suggest as they are based on the limited sample sizes available in the validation dataset, with four regions being represented by fewer than 20 observations.

A detailed analysis of prediction accuracy against scale was provided by applying square grids at a range of scales to the validation dataset. This analysis showed a substantial improvement in accuracy with increasing scale. At any given scale, the best accuracy was achieved by RK followed by OK and then PLS for both the 300 Index and Site Index (Fig. 4). Prediction accuracy of the 300 Index using RK was significantly improved at a scale of 1 km over plot-scale accuracy and continued to improve substantially at increasing scales beyond this level. For the Site Index, increasing the scale initially gave a smaller improvement in accuracy than for the 300 Index, but accuracy improved considerably at scales greater than 5 km.

Results for RK across a range of relevant scales are summarised in Table 3. In this table, scale is defined as the length of the side of each square used in the analysis. The area represented by a given scale could be as great as the square of this length although, in practice, areas of interest are more likely to be more irregularly shaped. Therefore, the area corresponding to each scale has been defined in Table 3 to range from that of a rectangle of aspect ratio 2:1 with the longest side equal to the scale, through to that of a square. The analysis shows that for a forest stand of 25–50 ha, accuracy (equivalent to the RMSE) compared to that of a plot (0.04 ha) improves from 3.3 to 2.0 m^{3} ha^{−1} year^{−1} for the 300 Index and from 2.0 to 1.6 m for the Site Index. For a medium-sized forest of 2500–5000 ha, prediction accuracy improves further to 1.2 m^{3} ha^{−1} year^{−1} for the 300 Index and 1.3 m for the Site Index.

## Discussion

This paper demonstrates a strong link between prediction error and spatial scale in maps of *P. radiata* productivity produced for New Zealand using an extensive dataset of plot measurements. Prediction errors of such spatial models are most often reported at the scale of the observations used in their development. However, the analysis shows that plot-level errors for these surfaces are conservative as there was a marked decline in error for both indices from the plot level to predictions made by averaging across larger scales. Observations used to develop these maps consisted of productivity indices derived from a measurement of a permanent sample plot, with plots mostly of 0.04 ha in area but with some observations consisting of measurements averaged over an experimental trial of several plots perhaps covering a hectare or so. At this scale, prediction accuracy defined in terms of RMSE is about 3.3 m^{3} ha^{−1} year^{−1} for the 300 Index and 2.0 m for the Site Index (Table 1). These values should be doubled to provide approximate 95% confidence intervals (95% CI). Therefore, at the plot scale, the maps predict the 300 Index with a 95% CI of ± 6.6 m^{3} ha^{−1} year^{−1} while the Site Index is predicted with a 95% CI of ± 4 m (Table 3).

However, in practice, the productivity maps are likely to be applied at scales much coarser than those of an individual plot. For example, to assist with planning and forest management, managers may wish to characterise the productivity of a stand or compartment within a forest, perhaps with an area in the tens of hectares. Another use of productivity maps is to estimate the productivity of small forests or woodlots which lack measurement plots with such stands likely to be in the tens or at most, hundreds of hectares. In both cases, the 95% CI of predicted productivity is about ± 4.0 m^{3} ha^{−1} year^{−1} and ± 3.2 m for the 300 Index and Site Index, respectively (Table 3). Another application is to estimate productivity of a proposed new forest planting. In this case, areas could range from the tens to thousands of hectares. Accuracy of the predicted 300 Index in terms of a 95% CI will therefore range from ± 4.0 m^{3} ha^{−1} year^{−1} for an area in the tens of hectares to ± 2.4 m^{3} ha^{−1} year^{−1} for an area in the thousands of hectares (Table 3). Finally, there may be requirements by agencies or companies to provide estimates of productivity for very large areas, e.g. for estimating carbon sequestration at a regional scale. The analysis indicates that such predictions can be made with a high level of accuracy.

That prediction accuracy improves with scale is not surprising. Spatial modelling techniques such as RK and OK take spatial autocorrelation into account when obtaining predictions. If these methods succeeded in completely eliminating spatial autocorrelation, the RMSE of a prediction averaged over a given area would halve for each quadrupling in sample area. However, the actual improvement in accuracy with increasing scale is much smaller than this, presumably due to the effects of spatial autocorrelation in the prediction errors. In practice, for the 300 Index surface, the RMSE halves at a scale of 3 km corresponding to an area of about 700 ha and halves again at a scale of about 20,000 ha. For the Site Index, the improvement in precision with increasing scale is even more gradual with the RMSE halving at a scale of about 20,000 ha and halving again at a scale of about 250,000 ha. However, these improvements in accuracy with increasing scale are better than obtained for a *P. radiata* wood density map where the RMSE halved at a scale of about 50 km (i.e. 125,000–250,000 ha) and halved again at a scale of 400 km (Palmer et al. 2013).

Most spatial modelling studies cite precision at the scale of the individual observations used in the modelling. The results of this study suggest that reported errors from spatial modelling studies are likely to be conservative as the scale of application at the stand or forest level is invariably greater than the scale at which the model was developed. Similar results have been reported in forest inventories used to predict forest parameters at varying spatial scales. For example, Breidenbach et al. (2010) used the *k*-nn technique to predict standing timber volume from airborne laser scanning data and ground-based survey data and found that standard errors of predictions were smaller at the stand level than the plot level and smaller for stands of 4–6 ha compared to stands of less than 2 ha.

The relative performance of the various prediction techniques employed in this study were similar to those obtained by Palmer et al. (2009b) who used nearly identical techniques applied to a smaller dataset. The dataset used in the current study consisted of the data used in the earlier study supplemented by considerable new data. Results in terms of model fitting were better than in the earlier study, presumably due to the higher spatial density of the ground measurement data and perhaps in part due to the elimination of measurement errors in a number of plots.

Regression methods such as PLS are most suited for situations with sparse observational data located at least 8 km apart (Palmer et al. 2009b). However, regression models of site productivity seldom explain much more than 50% of the variability due to the difficulty in measuring many of the environmental variables which affect tree growth, particularly variables associated with soil fertility. In the current study, PLS regression explained 56% of variation in the 300 Index and 63% of the variation in the Site Index, comparable to the levels of variation explained in previous studies using regression models such as those described by (Watt et al. 2010). Where high-density datasets are available, kriging can often provide better predictions than regression models as spatial dependence between points is greater when data points are located closely together. As the dataset used here was relatively dense, ordinary kriging gave more precise predictions than PLS regression. Finally, RK gave a modest but worthwhile improvement in performance over OK in this study, especially at scales of most interest (Fig. 4). This method combines the advantages of PLS regression at predicting between sparsely located points with those of kriging in accounting for short range spatial dependence in PLS error.

Using PLS and RK, the key climatic variables identified as important determinants of the 300 Index and Site Index largely agree with past studies. Temperature-related variables have generally been found to have the greatest influence on *P. radiata* productivity within New Zealand (Watt et al. 2005; Watt et al. 2008; Jackson and Gifford 1974; Hunter and Gibson 1984; Watt et al. 2010) and were found here to be important determinants of both the 300 Index (summer degree frost days) and Site Index (max temperature in summer). The positive relationship often found between air temperature and tree growth is thought to be principally driven by the lengthening of the growing season (Lieth 1973; Kerkhoff et al. 2005). This theory is consistent with selection of the number of frost days for the 300 Index as this variable controls the duration over which trees can grow during the year.

## Conclusions

This paper describes accurate models that quantify spatial variation in the 300 Index and Site Index using the largest and most complete dataset of its kind assembled for New Zealand plantation grown *P. radiata*. Regression kriging was found to be the most accurate method for describing this spatial variation for both productivity indices and reported accuracies were relatively high. Examination of changes in error with increases in spatial scale demonstrated a gradual decline in error from the plot level with increasing scale. For the 300 Index, the RMSE or 95% CI halve when predictions are averaged over an area of about 700 ha and halve again at a scale of about 20,000 ha. For the Site Index, the improvement in precision with increasing scale is more gradual with the 95% CI halving at a scale of about 20,000 ha and halving again at a scale of about 250,000 ha. This decline in error with increasing scale suggests that reported errors from spatial modelling studies are likely to exceed those that occur at the scale of operational application.

## Notes

Average height of the 100 largest diameter stems within a hectare

## References

Battaglia, M., Beadle, C., & Loughhead, S. (1996). Photosynthetic temperature responses of

*Eucalyptus globulus*and*Eucalyptus nitens*.*Tree Physiology, 16*(1–2), 81–89.Battaglia, M., Sands, P., White, D., & Mummery, D. (2004). CABALA: a linked carbon, water and nitrogen model of forest growth for silvicultural decision support.

*Forest Ecology and Management, 193*(1/2), 251–282.Bollmann, M. P., Sweet, G. B., Rook, D. A., & Halligan, E. A. (1986). The influence of temperature, nutrient status, and short drought on seasonal initiation of primordia and shoot elongation in

*Pinus radiata*.*Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere, 16*(5), 1019–1029.Breidenbach, J., Nothdurft, A., & Kändler, G. (2010). Comparison of nearest neighbour approaches for small area estimation of tree species-specific forest inventory attributes in central Europe using airborne laser scanner data.

*European Journal of Forest Research, 129*, 833–846.Cressie, N. A. C. (1990). The origins of kriging.

*Mathematical Geology, 22*, 239–252.Duchesne, L., & Houle, D. (2011). Modelling day-to-day stem diameter variation and annual growth of balsam fir (

*Abies balsamea*(L.) Mill.) from daily climate.*Forest Ecology and Management, 262*(5), 863–872. https://doi.org/10.1016/j.foreco.2011.05.027.Dungan, J. L. (1998). Spatial prediction of vegetation quantities using ground and image data.

*International Journal of Remote Sensing, 19*, 267–285.Franco-Lopez, H., Ek, A. R., & Bauer, M. E. (2001). Estimation and mapping of forest stand density, volume, and cover type using the

*k*-nearest neighbors method.*Remote Sensing of Environment, 77*, 251–274. https://doi.org/10.1016/S0034-4257(01)00209-7.Goulding, C. J. (2005). In M. Colley (Ed.),

*Measurement of trees. Section 6.5 of the NZIF forestry handbook*(4th ed.). Christchurch: New Zealand Institute of Forestry.Hunter, I. R., & Gibson, A. R. (1984). Predicting

*Pinus radiata*site index from environmental variables.*New Zealand Journal of Forestry Science, 14*(1), 53–64.Hunter, I. R., Rodgers, B. E., Dunningham, A., Prince, J. M., & Thorne, A. J. (1991). An atlas of radiata pine nutrition in New Zealand. In Forest Research Bulletin no. 165 (p. 24). Rotorua: Forest Research Institute.

Jackson, D. S., & Gifford, H. H. (1974). Environmental variables influencing the increment of radiata pine. (1) Periodic volume increment.

*New Zealand Journal of Forestry Science, 4*(1), 3–26.Jones, E. A., Reed, D. D., Cattelino, P. J., & Mroz, G. D. (1991). Seasonal shoot growth of planted red pine predicted from air-temperature degree days and soil-water potential.

*Forest Ecology and Management, 46*(3–4), 201–214.Kerkhoff, A. J., Enquist, B. J., Elser, J. J., & Fagan, W. F. (2005). Plant allometry, stoichiometry and the temperature-dependence of primary productivity.

*Global Ecology and Biogeography, 14*(6), 585–598. https://doi.org/10.1111/j.1466-822x.2005.00187.x.Kim, H., & Tomppo, E. (2006). Model-based prediction error uncertainty estimates for k-nn method.

*Remote Sensing of Environment, 104*, 257–263.Kimberley, M. O., West, G., Dean, M., & Knowles, L. (2005). Site productivity: the 300 Index—a volume productivity index for radiata pine.

*New Zealand Journal of Forestry, 50*, 13–18.Kirschbaum, M. U. F. (1999). CenW, a forest growth model with linked carbon, energy, nutrient and water cycles.

*Ecological Modelling, 118*(1), 17–59.Landsberg, J. J., & Waring, R. H. (1997). A generalized model of forest productivity using simplified concepts of radiation-use efficiency, carbon balance and partitioning.

*Forest Ecology and Management, 95*, 209–228.Leathwick, J., Wilson, G., Rutledge, D., Wardle, P., Morgan, F., Johnston, K., McLeod, M., & Kirkpatrick, R. (2003).

*Land Environments of New Zealand*. Wellington/Hamilton: Ministry for the Environment/Manaaki Whenua Landcare Research.Leathwick, J. R., Wilson, G., & Stephens, R. T. T. (2002). Climate surfaces for New Zealand. In

*Landcare Research Contract Report: LC9798/126*(p. 22). Hamilton, New Zealand: Landcare Research.Lieth, H. (1973). Primary production: terrestrial ecosystems.

*Human Ecology, 1*, 303–332.Makela, A., Landsberg, J., Ek, A. R., Burk, T. E., Ter-Mikaelian, M., Agren, G. I., Oliver, C. D., & Puttonen, P. (2000). Process-based models for forest ecosystem management: current state of the art and challenges for practical implementation.

*Tree Physiology, 20*(5/6), 289–298.McRoberts, R. E. (2006). A model-based approach to estimating forest area.

*Remote Sensing of Environment, 103*, 56–66.McRoberts, R. E., Tomppo, E. O., Finley, A. O., & Heikkinen, J. (2007). Estimating aerial means and variances of forest attributes using

*k*-nearest neighbors technique and satellite imagery.*Remote Sensing of Environment, 111*, 466–480. https://doi.org/10.1016/j.rse.2007.04.002.Mitchell, N. D. (1991). The derivation of climate surfaces for New Zealand, and their application to the bioclimatic analysis of the distribution of kauri (

*Agathis australis*).*Journal of the Royal Society of New Zealand, 21*, 13–24.Monteith, J. L., & Moss, C. J. (1977). Climate and the efficiency of crop production in Britain.

*Philosophical Transactions of the Royal Society of London Series B-Biological Sciences, 281*, 277–294.New Zealand Forest Owners Association (2010).

*New Zealand plantation forest industry facts and figures*. Wellington, New Zealand: New Zealand Forest Owners Association.Newsome, P. F. J. (1987). The vegetative cover of New Zealand. In

*Water and Soil Miscellaneous Publication no. 112*. Wellington, New Zealand: Soil Conservation Centre, Aokautere, Ministry of Works and Development.Newsome, P. F. J., Wilde, R. H., & Willoughby, E. J. (2000).

*Land Resource Information System Spatial Data Layers*. Palmerston North, New Zealand: Landcare Research.Odeh, I. O. A., McBratney, A. B., & Chittleborough, D. J. (1995). Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging.

*Geoderma, 67*, 215–226.Palmer, D. J., Höck, B. K., Dunningham, A. G., Lowe, D. J., & Payn, T. W. (2009a). Developing National-Scale Terrain Attributes for New Zealand (TANZ).

*Forest Research Bulletin no. 232*. Rotorua, New Zealand: Scion.Palmer, D. J., Höck, B. K., Kimberley, M. O., Watt, M. S., Lowe, D. J., & Payn, T. W. (2009b). Comparison of spatial prediction techniques for developing

*Pinus radiata*productivity surfaces across New Zealand.*Forest Ecology and Management, 258*(9), 2046–2055. https://doi.org/10.1016/j.foreco.2009.07.057.Palmer, D. J., Watt, M. S., Höck, B. K., Lowe, D. J., & Payn, T. W. (2009c). A dynamic framework for spatial modelling

*Pinus radiata*soil water balance (SWatBal) across New Zealand.*Forest Research Bulletin no. 234*. Rotorua, New Zealand: Scion.Palmer, D. J., Watt, M. S., Kimberley, M. O., & Dungey, H. S. (2012). Predicting the spatial distribution of

*Sequoia sempervirens*productivity in New Zealand.*New Zealand Journal of Forestry Science, 42*, 81–89.Palmer, D. J., Kimberley, M. O., Cown, D. J., & McKinley, R. B. (2013). Assessing prediction accuracy in a regression kriging surface of

*Pinus radiata*outerwood density across New Zealand.*Forest Ecology and Management, 308*, 9–16.Pillar, C. H. & Dunlop, J. D. (1990). The permanent sample plot system of the New Zealand Ministry of Forestry. Forest Growth Data: Retrieval and Dissemination. Proceedings of Joint IUFRO Workshop S4.02.03 and S4.02.04, Gembloux, Belgium.

Richards, F. J. (1959). A flexible growth function for empirical use.

*Journal of Experimental Botany, 10*, 290–300.Sampson, D. A., Waring, R. H., Maier, C. A., Gough, C. M., Ducey, M. J., & Johnsen, K. H. (2006). Fertilization effects on forest carbon storage and exchange, and net primary production: a new hybrid process model tor stand management.

*Forest Ecology and Management, 221*(1–3), 91–109. https://doi.org/10.1016/j.foreco.2005.09.010.SAS Institute Inc. (2008).

*SAS/STAT 9.2 User’s guide*. Cary, NC, USA: SAS Institute Inc..Searle, S. R. (1971).

*Linear models*(p. 532). New York: Wiley.Tomppo, E., & Halme, M. (2004). Using course scale forest variables as ancillary information and weighting of variables in

*k*-NN estimation: a genetic algorithm approach.*Remote Sensing of Environment, 92*, 1–20.van der Colff, M., & Kimberley, M. O. (2013). A national height-age model for

*Pinus radiata*in New Zealand.*New Zealand**Journal of Forestry Science, 43*:4.Watt, M. S., Coker, G., Clinton, P. W., Davis, M. R., Parfitt, R., Simcock, R., Garret, L., Payn, T. W., Richardson, B., & Dunningham, A. (2005). Defining sustainability of plantation forests through identification of site quality indicators influencing productivity—a national view for New Zealand.

*Forest Ecology and Management, 216*, 51–63.Watt, M. S., Davis, M. R., Clinton, P. W., Coker, G., Ross, C., Dando, J., Parfitt, R. L., & Simcock, R. (2008). Identification of key soil indicators influencing plantation productivity and sustainability across a national trial series in New Zealand.

*Forest Ecology and Management, 256*(1–2), 180–190.Watt, M. S., Palmer, D. J., Dungey, H., & Kimberley, M. O. (2009). Predicting the spatial distribution of

*Cupressus lusitanica*productivity in New Zealand.*Forest Ecology and Management, 258*(3), 217–223. https://doi.org/10.1016/j.foreco.2009.04.003.Watt, M. S., Palmer, D. J., Kimberley, M. O., Höck, B. K., Payn, T. W., & Lowe, D. J. (2010). Development of models to predict

*Pinus radiata*productivity throughout New Zealand.*Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere, 40*(3), 488–499. https://doi.org/10.1139/X09-207.Wold, S. (1966). Estimation of principal components and related models by iterative least squares. In P. R. Krishnaiah (Ed.),

*Multivariate analysis*(pp. 391–420). New York: Academic Press.

## Acknowledgements

We are indebted to numerous forest companies and private owners for allowing us to access their data. We also thank Carolyn Andersen for her assistance in obtaining permission to use and in extracting PSP data for this study. Mina van der Colff assisted in processing the PSP data, Dave Palmer provided background assistance, and the paper was greatly improved by suggestions from an anonymous reviewer.

### Funding

Study design, data collection and analysis were funded by Future Forests Research Ltd. Scion funded the writing of the manuscript.

## Author information

### Authors and Affiliations

### Contributions

MK contributed to the assembly of data, undertook the statistical analyses, and contributed to the writing of the manuscript. MW contributed to the writing of the manuscript and to the interpretation of the results. DH assisted with the assembly of data and produced the maps displaying the models. All authors read and approved the final manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Kimberley, M.O., Watt, M.S. & Harrison, D. Characterising prediction error as a function of scale in spatial surfaces of tree productivity.
*N.Z. j. of For. Sci.* **47, **19 (2017). https://doi.org/10.1186/s40490-017-0100-8

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40490-017-0100-8

### Keywords

- 300 Index
- Modelling
- Site Index
- Plantation forestry
- Radiata pine
- Spatial modelling