Open Access

Application of LiDAR data to maximise the efficiency of inventory plots in softwood plantations

New Zealand Journal of Forestry Science201545:9

DOI: 10.1186/s40490-015-0038-7

Received: 23 August 2014

Accepted: 15 April 2015

Published: 6 June 2015

Abstract

Background

Precision in describing plantation attributes is a key requirement for forestry managers and inventory surveys aim to extract the most precise information possible using the smallest number of plots. This paper quantifies the potential efficiencies to be gained by using Light Detection and Ranging (LiDAR) data as an aid to estimation of standing timber volume in softwood plantations. A range of inventory design and estimation methods were investigated in terms of their overall predictive efficiency.

Methods

Field measurements representing four different populations from two Pinus radiata D. Don plantations in New South Wales, Australia, were used to inform statistical models which were then employed to simulate populations of inventory plots. These plots were then “surveyed” using a variety of simulated sampling strategies to quantify the benefits from using LiDAR data as auxiliary information. Model-based and design-based methods were both investigated. Survey design options included stratification and plot selection strategies; estimation options included ratio estimation and regression modelling. Results were compared in terms of the relative bias and root mean squared error of the estimates.

Results

The study suggests that relative efficiencies of two-fold or better, are possible with either model-based or model-assisted estimators compared to traditional inventory surveys which use grid samples and simple design-based estimators. This would enable a halving in the required sample size for the same precision for field inventories in these plantations.

Conclusion

The use of LiDAR data as an aid to survey design produces marked efficiency gains compared to traditional inventory methods.

Keywords

Sampling efficiency LiDAR Inventory surveys Model-based sampling Design-based sampling

Background

Plantation managers are concerned with the establishment, health, inventory and growth trajectory of commercial forests. Greater precision in describing plantations should lead to more accurate forecasts and therefore better commercial decisions. As the gathering of information requires expenditure, these benefits must be balanced with the costs in order to determine the optimal amount of information that should be collected for making a decision (Gilabert and McDill 2010). Therefore, there is a strong motivation to improve sampling design efficiencies in field inventory, with the aim of minimising the number of labour intensively measured plots while maximising predictive performance and minimising bias (Junttila et al. 2008).

The increasing availability of high-resolution remotely sensed information, including Light Detection and Ranging (LiDAR) (also referred to as airborne laser scanning (ALS)) sensors, is now routinely being coupled with geo-referenced inventory plot measurements to improve large- and small-scale estimation of forestry parameters (e.g. White et al. 2013). Several studies have now demonstrated the utility of LiDAR data in the survey design phase for optimal sample selection (e.g. Gobakken et al. 2013; Maltamo et al. 2009a) as well as in the estimation phase (e.g. McRoberts et al. 2012) to improve prediction accuracies and/or cost efficiencies. The benefits can be an outcome of differing efficiencies arising from either the sampling design and/or the estimation method. In both cases, efficiencies are gained when there is a strong relationship between the forest variable of interest and the derived LiDAR metrics (Maltamo et al. 2011). If this relationship is strong and the postulated model is able to capitalise on this relationship, then substantial gains in efficiency can be realised (Junttila et al. 2013). Junttila et al. (2013) also claimed that the most important factor improving sampling design efficiency occurred when the selected plots were well spread in feature space. This is supported by Grafström and Ringvall (2013) who reported, when examining several sampling designs, that well-spread samples not only improved model fitting but also design-based estimation.

Sampling strategies have been investigated across numerous forest-related activities through a range of modelling approaches within both design-based (probability) and model-based inference frameworks (e.g. Hansen et al. 1983: McRoberts 2010; Wulder et al. 2012 and references therein). Traditionally, forest inventories calculate estimates of growing stock volume using design-based simple random sampling, stratified or model-assisted estimators (McRoberts 2006). More recently, several design-based and model-based variance estimators have been investigated using data from LiDAR-assisted forest surveys (e.g. Gregoire et al. 2011; McRoberts et al. 2013; Ståhl et al. 2011). Efficient sampling strategies are attained by applying an optimal sampling design in addition to identifying the best variance estimators for that particular design as well as for the specified model.

In the design-based framework, the population is regarded as fixed whereas the sample is regarded as a realisation of a stochastic process (Gregoire 1998). Model-assisted estimators are a subclass of design-based estimators in which a model is used to describe the population (e.g. Gregoire et al. 2011; McRoberts 2010). Relative to stratified sampling using the same auxiliary data, model-assisted estimators can potentially make better use of each variable by using the complete range of values rather than dividing the variable into classes. More recently, model-based inference in LiDAR survey sampling has gained popularity (e.g. Ståhl et al. 2011). In model-based inference, the sampling method does not necessarily have to be random; that is, the sample plots can be treated as fixed, the population variables treated as random, and variation (and therefore inference) is derived from the model (McRoberts 2010). Both model-assisted and model-based approaches rely more heavily on models and auxiliary variables to produce estimates than traditional design-based approaches and therefore may produce more precise estimates for small areas (McRoberts et al. 2013). Also model-based procedures allow prediction and error estimation at the pixel level, and hence can result in attribute mapping (McRoberts et al. 2013). However, their main disadvantage is the potential for substantial bias (if the model is poorly specified) plus such models are often computationally intensive (McRoberts et al. 2013).

Within these statistical frameworks, the performance of various sampling designs incorporating LiDAR variables has been examined through a range of modelling techniques including k-Most Similar Neighbours (MSN) (e.g. Maltamo et al. 2009b), Bayesian regression (Junttila et al. 2008), nonlinear logistic regression (McRoberts et al. 2013), linear regression techniques (Dalponte et al. 2011; Ene et al. 2013) and nonlinear regression modelling (Næsset et al. 2011). A common approach is to compare the prediction accuracies of the estimation techniques under investigation after reducing the number of sample plots (Dalponte et al. 2011; Junttila et al. 2013). McRoberts et al. (2012), on the other hand, assessed the utility of LiDAR data as the basis for post-stratification when applied to three techniques for estimating mean growing stock volume: a multiple linear regression model, a nonlinear logistic regression model, and the k-Nearest Neighbour technique.

If appropriate LiDAR data are available prior to sampling, then they can be used directly to stratify the field survey (Gobakken et al. 2013; Hawbaker et al. 2009; Maltamo et al. 2011; van Aardt et al. 2006). Hawbaker et al. (2009), for example, improved sampling efficiency and prediction accuracies of mean diameter, basal area, stand height and biomass using gridded LiDAR data as a priori information to define additional strata within a mixed species coniferous forest in northeastern Wisconsin, USA. For each forest type, potential sampling locations were stratified into 10 x 3 strata according to mean LiDAR height (10 levels) and standard deviation (3 levels) of mean LiDAR height. This ensured that the entire data range of the predictor variables was sampled on the ground and resulted in better predictions by the LiDAR-derived regression models. If, however, the LiDAR data are acquired after the fieldwork, an increase in precision may still be achieved through incorporation of LiDAR-derived post-stratification within the estimation process (Dalponte et al. 2011; Ene et al. 2013; McRoberts et al. 2012). Note that stratification for the estimation of a single population attribute is generally advantageous but the situation is less clear for a multipurpose inventory with many attributes of interest (Köhl et al. 2006; Næsset et al. 2013). After stratification, there exist several strategies for plot allocation that provide the optimal sample size for each stratum. For example, Neyman allocation is a method used to allocate samples based on the stratum variances and population sizes, assuming similar sampling costs in the strata, and given a fixed total sample size (Cochran 1977).

Recently, several studies have investigated a number of theoretical alternatives to stratification for LiDAR-assisted sample plot selection strategies (e.g. Grafström and Ringvall 2013; Grafström et al. 2014). Grafström et al. (2014), for example, introduced the local pivotal method, to select well-spread probability samples, in multidimensional spaces, from larger populations.

The scheduling of LiDAR campaigns is defined by the characteristics of the plantation and planning objectives, and schedules need to be adjusted annually. Efficiency gains from using LiDAR are dependent on strong correlations between the LiDAR variables and the variables to be estimated, and hence they benefit from minimum time delay between the acquisition of LiDAR data and the collection of field data. A number of researchers have investigated the effects of integrating historical and recently collected data. Rombouts et al. (2010) examined data from several campaigns separated by up to seven years and found that campaign was statistically significant in models relating stand volume to LiDAR data. However, they also concluded that the integration of prior data with current data produces gains in efficiency especially when the number of recent plots is limited. Junttila et al. (2010) employed earlier data to improve estimates using Bayesian regression. They also provided a method for correcting for time differences using linear correction mapping and stated that small changes in canopy height and forest structure and small changes in scanning conditions can be accommodated by their procedure.

There are two generic approaches to attribute estimation using LiDAR data. Firstly, “area-based” methods where the area-based prediction of forest variables is based on the statistical dependency between variables measured in field plots and the predictor features derived from LiDAR data (e.g. Corona and Fattorini 2008; White et al. 2013). Currently the area-based approach is operationally applied in several countries when carrying out standwise forest inventories (e.g. Hyyppä et al. 2012; White et al. 2013). Secondly, “tree-based” methods where individual trees are detected and tree-level variables are measured or predicted from the LiDAR data (e.g. Hyyppä et al. 2012). In this study, area-based methods were used where LiDAR data was summarised into virtual plots and which were co-located with inventory plots which were measured on the ground. It is generally acknowledged that if individual tree crowns can be recognised accurately in the LiDAR data then this approach tends to outperform the area-based methods (Yu et al. 2010). However, the area-based approach enables one to retrieve canopy height information by means of relatively coarse pulse density LiDAR data (~2 points m−2) and hence cheaper acquisition costs (Jakubowski et al. 2013).

In the area-based approach, statistical metrics and other nonphysical distribution-related features of LiDAR height measurements can be extracted either directly from normalised LiDAR point clouds or from a normalised rasterised representation of laser hits (e.g. the normalised Digital Surface Model or Canopy Height Model (CHM) (Hyyppä et al. 2008; White et al., 2013). By using the CHM to derive metrics, the authors acknowledge the under-utilisation of the information content of the LiDAR point cloud. However this approach has proven to be operationally robust and requires less computational input (Hyyppä et al. 2008). Both area-based distributional metrics and individual tree level metrics have been extracted from CHMs in an operational workflow designed for automatic inventory estimates of Pinus radiata D. Don from LiDAR data, and this operationally orientated approach (Chen and Zhu 2013) was followed here.

In this study several survey design components were varied and compared using auxiliary variables derived from LiDAR data to improve efficiencies in the inventory of Pinus radiata plantations. Design-based and model-based methods were both investigated. Survey design options included stratification and plot selection strategies, and estimation options included ratio estimation and regression modelling. The methodology involved using simulations based on linear mixed-effects modelling of four plot based surveys. The variable of interest in the simulations was total stand volume (TSV) per plot. Specifically, the objective of this study was to determine whether it is possible to guide the choice of ground truth samples, using a priori knowledge, so as to reduce the number of field plots for P. radiata inventories, without compromising prediction accuracies.

Methods

Study site

Two sites were selected for this study. The first of these was the Nundle State Forest P. radiata plantation, approximately 7000 ha in size and located in the Northern Tablelands of New South Wales (NSW), Australia. The second site was the Green Hills State Forest, which comprises 5000 ha of P. radiata, and is located in the Southern Tablelands of NSW. Both plantations are managed by the Forestry Corporation of New South Wales (FCNSW). In the Nundle study area, two unthinned compartments were selected representing young and old age classes respectively. The young compartment was planted in 2002 (2002 age class (AC)) at a stand density of 1000 trees ha−1, while the older compartment was planted in 1977 (1977 AC) at a stand density of 1200 trees ha−1 (Table 1). The Green Hills study site contained a full representation of age classes (from just planted to > 35 years old) and a range of associated silvicultural thinning treatments (described in Stone et al., 2011).
Table 1

Comparison of stand attributes

Survey

Stand density (stems ha −1 )

Stand volume (m 3 ha −1 )

DBH (cm)

Basal area (m 2 ha −1 )

Mean dominant height (m)

Mean height (m)

Nundle 1977

279 ± 123

500 ± 151

45.6 ± 7.5

45.2 ± 13.7

35.2 ± 2.5

33.3 ± 1.0

(48–676)

(205–855)

(34.7 - 74.1)

(18.6 - 78.0)

(27.8 - 39.4)

(31.1 - 35.9)

Nundle 2002

593 ± 289

57 ± 25

16.8 ± 3.0

13.5 ± 5.8

11.0 ± 1.0

9.8 ± 0.5

(100–1250)

(11–135)

(11.6 - 24.5)

(2.7 - 31.7)

(8.8 - 12.9)

(8.9 - 10.9)

Green Hills Inventory (1979)

197 ± 34

249 ± 44

40.0 ± 2.1

24.6 ± 3.7

30.3 ± 2.0

28.0 ± 2.2

(140–260)

(161–342)

(35.3 - 43.6)

(17.6 - 32.8)

(25.6 - 34.1)

(23.5 - 31.4)

Green Hills Research (30 + yrs)

274 ± 40

413 ± 46

42.1 ± 1.3

38.8 ± 4.7

33.4 ± 1.2

31.3 ± 1.1

(187–305)

(322–463)

(40.3 - 43.9)

(29.5 - 43.6)

(31.2 - 35.0)

(30.0 - 33.6)

mean ± sd (min - max).

Data collection

Airborne scanning LiDAR data

Discrete return airborne LiDAR data were acquired over the Nundle State Forest P. radiata plantation (Nundle) in June 2011 using a Trimble Harrier 68i system mounted on a Cessna U206G airplane. The technical specifications of the LiDAR data acquisition are provided in Table 2. The LiDAR points were processed, geo-referenced and classified by the service provider into ground and non-ground categories using their proprietary method and MARS software (Merrick & Company, Colorado, USA).
Table 2

LiDAR acquisition specifications

LiDAR acquisition specification

Nundle S.F. study area

Green Hills S.F. study area

Swath width (m)

925

500

Scan overlap (%)

40

 

Sensor system

Trimble Harrier

Lite Mapper

68i-LMS-Q5600 (Riegal)

LMS-Q5600 (Riegal)

Measurement rate (KHz)

81

88

Field of view (sum of the two angles off nadir, degrees)

30

30

Flying height (metres above ground level)

1155

1100

Point density (points m−2)

3.6

2.0

Vertical/horizontal accuracy (σ in metres)

0.25 / 0.5

0.25 / 0.5

Average footprint size (m)

0.5

0.6

Beam divergence (milliradians)

0.5

0.5

σ = standard deviation.

The discrete return LiDAR data were acquired over the Green Hills State Forest (SF) P. radiata plantation (Green Hills) in July 2008 using a Lite Mapper LMS-Q5600 ALS system (Riegl, Austria) using a mean swath width of 500 m (other technical specifications are listed in Table 2). The points in this dataset were also classified by the provider into ground and non-ground categories using their proprietary method and TerraScan software (TerraSolid Ltd., Helsinki, Finland).

For both the Nundle and Green Hills study areas, a 1 m spatial resolution Digital Elevation Model (DEM) was constructed from the ground point data using a standard linear triangulation surface modelling technique in Environment for Visualizing Images (ENVI 4.7) software (Research Systems Inc., USA). A similar triangulation approach was used to generate a Digital Surface Model (DSM) from all first returns (selecting the highest point in each grid cell). Then a 1 m pixel resolution CHM surface was constructed by subtracting the DEM from the DSM which represents the height of the canopy above ground level (ENVI 4.7).

For the Nundle stands, the CHM raster was initially masked to minimise potential edge effects by applying an internal 18 m buffer to the compartment boundary layer. In the older stand (1977 AC), circular 0.1 ha plots (17.84 m radius) were located at the intersection points of a 35 m grid to generate 1496 contiguous, virtual plots. The grid pattern ensured that the circular plots did not overlap and the plot dimensions were chosen so that each virtual plot would contain 19 trees on average, based on the LiDAR estimated stand density. In the younger stand (2002 AC), circular 0.04 ha plots (11.28 m radius) were located at the intersection points of a 23 m grid to generate 572 plots and each of these virtual plots was expected to contain 19 trees on average. When the stand densities were later measured on the ground they were somewhat higher than the LiDAR estimates so that the number of trees per virtual plot was actually 28 and 24 respectively for the 1977 and 2002 AC stands. LiDAR metrics were extracted for each virtual plot based on the 1 m CHM pixels contained within the plot (pixels on the circumference were included if they mostly lay inside the plot). The LiDAR metrics extracted are described in Table 3 and included mean height, mean above mean height, mean dominant height, estimated stand density per hectare percentage canopy cover (CC), occupied volume ( OV), height variance and height skewness (Turner et al. 2011).
Table 3

Description of LiDAR metrics

LiDAR metric

Definition

Mean height

Mean height of pixels in virtual plot

Mean above mean height

Mean height of pixels which are above the mean height

Mean dominant height

Mean height of pixels identified as maxima using a 5 × 5 m window

Stocking density

Estimated stems per ha based on a 3 × 3 m local maxima (the window size most closely related to stand density)

Canopy cover

Percentage of pixels above 3 m in height

Occupied volume

Sum of all pixel heights in virtual plot

Variance

Variance of pixels in virtual plot

Skewness

Skewness of pixels in virtual plot

At the second study site, also using the 1 m CHM, the Green Hills plot centre coordinates were imported into ENVI 4.2 software and buffered to a distance of 35 m. These vectors were then used as sampling templates to extract the same suite of LiDAR metrics as for the Nundle study.

Ground survey design

At the time of measurement, both of the Nundle stands were relatively heterogeneous in terms of stand density and tree spacing due to the absence of any pre-harvest thinning, as well as historic exposure to browsing and drought which affected tree establishment and survival. This heterogeneity was expected to make accurate estimates difficult to achieve using ground-based assessment based on a traditional systematic survey design. Because of this, stratification of the two Nundle stands was achieved using the LiDAR data as a priori information (e.g. Hawbaker et al. 2009; Maltamo et al. 2011), specifically mean height per virtual plot and estimated stand density. Both variables were divided into three ranges using the Dalenius-Hodges approach (Cochran 1977). This method resulted in nine non-contiguous strata, and each virtual plot was assigned to one of these strata. A random sample of 90 plots was then selected with an equal number of plots per stratum (10) using simple random sampling without replacement in each stratum. Subsequently, a subset of six plots per stratum was randomly selected for ground-based measurement (Table 4). The sample size was chosen so that the relative standard error of stand volume was 5% or better, using the design-based expansion estimator [3] which is given below.
Table 4

Comparison of survey data

Survey

Stratification

Number of strata

Number of plots

Sample size

R 2 values for model [ 1 ]

CV of timber volume (%)

Nundle 1977

LiDAR

9

1496

54

0.69

31

Nundle 2002

LiDAR

9

572

52

0.57

44

Green Hills inventory

Management

4

-

100

0.88

61

Green Hills research

Management

16

-

63

0.93

71

CV = coefficient of variation.

At the Green Hills site, plot-based tree data from two surveys undertaken at approximately the same time were used for the analysis. One of the surveys comprised 100 conventional inventory plots with four strata (operational Planning Units), defined according to age class and thinning status (specifically the four strata were 1998 AC and unthinned; 1979 AC and thinned twice; 1983 AC and thinned twice; and 1983 AC and unthinned) (Table 4). These plots were established in accordance with standard FCNSW inventory procedures; they are based on a sampling intensity of one plot per four hectares, employ a systematic grid design with a random starting point and are designed to achieve a probable limit of error (PLE, as defined below) of 10% in terms of predicted stand volume, using the design-based expansion estimator [3] which is given below. Within the same study area, a further 63 plots were measured as part of a research study (Stone et al. 2011) with a sampling design comprising 16 strata defined according to age class, thinning history and ground slope categories. Four circular plots were randomly assigned to each stratum (one stratum had only three plots), bringing the total to 63 research plots. This sample size enabled models of tree height vs LiDAR height to be estimated with a relative root mean squared error (RMSE) of 5%.

Field data

In the Nundle 1977 AC stand, all of the 54 plots were used (ranging from 12.62 m to 19.95 m in radius), providing a cluster of selected trees containing approximately 22 trees per plot on average. In the Nundle 2002 AC stand, 52 of the plots were used (ranging from 7.98 m to 17.84 m in radius), providing a cluster of selected trees containing approximately 23 trees per plot on average (two plots were inaccessible). The use of “flexible radius” plots that is varying the plot radius according to the estimated stocking density, was employed to maintain an approximately constant workload throughout the survey (Melville et al. 2015). However the plots themselves were chosen with equal probability within strata. In the field, plot centres were located using two GPS units (Garmin Ltd.), and then validated through crown interpretation of open spaces between trees using printed copies of the LiDAR imagery. Every tree in each selected cluster was labelled and diameter at breast height over bark (DBH at 1.3 m) measured. At least five trees per selected cluster, representing the tallest trees, were selected and their heights measured using a hypsometer (Vertex III, Haglof, Sweden). These trees were used to predict the heights of the non-measured trees using the nonlinear dominant height vs DBH relationships which are used by FCNSW for this species. Plot standing volume was then estimated using the tapering equations employed by FCNSW for these plantations.

In the Green Hills inventory survey, plot measurements and tree assessments were undertaken by an independent local inventory crew using ATLAS Cruiser methodology (Atlas Technology, Rotorua, NZ) which provided a cluster of selected trees containing approximately 18 trees per plot on average. The inventory plot centres were located using a Scout Pak GPS (Juniper Systems, Logan, Utah, USA) with post-processing differential correction. In the Green Hills research survey the 63 plots were assessed using flexible radius plots, as described above, with radii ranging from 7.98 m to 17.84 m. This provided a cluster of selected trees containing approximately 15 trees per plot on average. The research plot centres were located using differential Global Positioning System (Trimble Navigation Ltd., Sunnyvale, California, USA) and a precision survey (with a laser theodolite). These plots were then measured using the same methodology as the Nundle plots.

Simulation approach

Overview

The approach used in the study was to construct simulation models using actual data, which comprised the data acquired using LiDAR together with the data collected from the ground-based surveys. The simulation models were then used to generate artificial plantations which were sampled using a range of different strategies. These strategies were then compared in terms of the precision of the resulting estimates. A flowchart illustrating the approach is provided in Figure 1. Using the Nundle 1977 AC as an example, the procedural steps were as follows:-
https://static-content.springer.com/image/art%3A10.1186%2Fs40490-015-0038-7/MediaObjects/40490_2015_38_Fig1_HTML.gif
Figure 1

Flowchart illustrating simulation approach for the Nundle 1977 AC stand.

  1. 1.

    Sample 54 plots from a compartment containing 1496 plots. This sample was stratified using LiDAR height and stocking density (nine strata).

     
  2. 2.

    Conduct ground measurements of the sampled plots.

     
  3. 3.

    Estimate the parameters in simulation models (1) and (2).

     
  4. 4.

    Use models (1) and (2) to simulate data for 2700 plots in nine strata (300 plots per strata – same stratification as step 1), including stand volume, OV and CC.

     
  5. 5.

    Since the strata used to simulate the plot population is deemed to be “unknown” for the simulated sampling, the simulated population is re-stratified using the auxiliary variables OV and CC.

     
  6. 6.

    Select 24 plots using one of the sampling strategies under consideration.

     
  7. 7.

    Estimate the total volume of the simulated population and compare the estimate to the simulated “actual” value.

     
  8. 8.

    Repeat steps 4 to 7 a total of 10,000 times.

     
  9. 9.

    Calculate the relative bias and RMSE using the estimates and actual values.

     

Statistical modelling and plot simulation

The four datasets were examined separately by fitting linear mixed models to the variables of interest, namely timber volume and the LiDAR variables OV and CC (Turner et al. 2011). Although several LiDAR variables were available, the variables OV and CC were chosen because they were the best predictors of stand volume. It was also considered important to have separate variance terms for the strata which were defined in the original surveys; this is because each of the key variables changes markedly from one stratum to another, both in terms of the mean value and the variance. The use of mixed models allows for the estimation of separate stratum variances thus providing a more complete description of the data (Gilmour et al. 1995). The parameter estimates from the mixed models can then be used to generate plot data with characteristics very similar to the original data they were based on.

For each of the four plantation surveys, the following model was used for timber volume at the plot level
$$ \boldsymbol{Y}={\boldsymbol{Z}}^T\boldsymbol{\beta} +\boldsymbol{\varepsilon} $$
(1)

where Y represents total volume, Z is a design matrix consisting of stratum and OV and ε ~ N(0, R). The term R = diag(V 1 , … V H ) represents the error structure, which is assumed to be a diagonal matrix with separate stratum variances, represented by V h where h indexes strata and H is the total number of strata. In this model V h may also be written as σ2 h I mh where σ2 h and m h represent the plot variance and number of plots respectively for stratum h and I mh is the identity matrix of order m h . A preliminary model which included off-diagonal correlation terms was subsequently replaced by the diagonal variance model because the correlation terms were very small.

A secondary linear mixed model was used to simultaneously model the LiDAR variables OV and CC at the plot level. This model has the same form as equation [1] where Y is multivariate matrix with column vectors for OV and CC, Z is a design matrix consisting of stratum and the multivariate mean vector (with intercept terms for OC and CC) and R = diag(V 1 , … V H ) represents the error structure, which is assumed to be a block diagonal matrix with variance matrices, V h , given by
$$ {V}_h=\left(\begin{array}{cc}\hfill {\sigma}_{h1}^2\hfill & \hfill {\sigma}_{h12}\hfill \\ {}\hfill {\sigma}_{h12}\hfill & \hfill {\sigma}_{h2}^2\hfill \end{array}\right)\begin{array}{cc}\hfill \hfill & \hfill \hfill \\ {}\hfill \hfill & \hfill \hfill \end{array} $$
(2)

In this model σ2 h1 and σ2 h2 are the within stratum variances of OV and CC respectively and σ h12 is the within stratum covariance term. As with the model for timber volume, preliminary analysis revealed that the off-diagonal correlation terms are sufficiently small to ignore.

For each of the four forestry surveys, a “population” of 2700 plots (each plot equal to 0.05 ha) was simulated using the parameter estimates from model [1]. The actual model parameters and stratum variances were estimated separately for each survey. Values for OV and CC were also simulated using the multivariate model for use as stratification variables. The R2 values associated with model [1] are shown in Table 4 as well as the coefficients of variation (CV) for plot volume.

In the Nundle surveys, a measure of OV based on a fixed circular plot was used and timber volume was expressed in terms of cubic metres per hectare. In the two Green Hills surveys, a measure of OV related to the actual size of the measured plots was used and timber volume was expressed in terms of volume (m3) per plot. In both cases, the 1 m pixel resolution enables calculation of the LiDAR variables for an approximately circular plot and both methods produce satisfactory correlations between OV and timber volume. Hence the decision to use either metrics calculated according to a fixed radius, or metrics calculated according to the actual measured radius was based on the available data. The LiDAR variables used for stratification in the Nundle surveys were mean canopy height and stand density. The management variables used for stratification in the Green Hills surveys were age class and thinning (operational inventory survey) and age class, thinning and slope (research survey). With the Green Hills stands, after simulating data for all strata, the oldest of these were used in comparing the various sampling strategies. In the Green Hills inventory plots these were the 1979 AC stands while in the Green Hills research plots they were the 30+ year AC compartments. The oldest plots were deemed to be most relevant to this study as they were the plots which were closest to harvest.

Simulated survey methods

Inferential framework

After simulating the population of plots, both design-based and model-based strategies were investigated for survey design and estimation. Since stratification and Neyman allocation are typically used with design-based estimators, including model-assisted estimators (Foreman 1991), a number of different stratification options and two different types of estimation were explored. Stratification options included the use of different numbers of strata (two, three, four and six) and the use of one or two stratification variables (OV and OV + CC). Types of estimation included expansion and regression estimators, as described below. For the model-based strategies balanced sampling was used to obtain a representative sample of plots. Balanced sampling ensures that the Horvitz-Thompson estimators of the population totals of the design variables (OV and CC) equal the known totals of these variables and the method may be used without stratification. Since balanced samples are not strictly random they were not employed with the design-based estimators. They were used with the model-based methods because they were expected to be more efficient than a random sample. In these simulations the R-package “samplecube” was used which employs an algorithm developed by Deville (2004) for sample balance. Note that the selection probabilities themselves are not used in the model-based estimate even though the balanced sample is defined with respect to the Horvitz-Thompson estimator.

Stratification and plot selection

A simple random sample of plots was chosen from the population of available plots to simulate sampling without stratification. To simulate stratified sampling, the primary variable OV was chosen as it is best correlated with stand volume (Turner et al. 2011). Where stratification was based on two variables, the secondary variable CC was chosen as it is also closely related to stand volume and the two variables together form the best prediction pair for stand volume. With a single stratification variable, the number of strata employed was two, three, four or six. With two stratification variables, the number strata employed was either four (2 × 2) or six (3 × 2). Stratum boundaries were calculated using the Dalenius-Hodges method (Cochran 1977). Neyman allocation was used to allocate plots to strata. Using this scheme plots are allocated according to n h  ~ N h S h , where h indexes strata, n h represents stratum sample size and N h and S h represent stratum population size and standard deviation of timber volume respectively (Cochran 1977). We assumed that the standard deviation of timber volume, S h , would be unknown for the purposes of survey design and used the standard deviation of the primary stratification variable instead. A minimum sample size constraint was also specified (two plots per stratum).

For the model-based methods, the sampling was based on either one or two balancing variables and stratification was not employed as a sampling tool. The design variable(s) used for plot selection was either OV (single balancing variable) or both OV and CC (two balancing variables). The target sample size was 24 plots throughout the study which is similar to the number of plots used by FCNSW for this sized population. Minor variation sometimes occurred due to minimum sample size constraints and the actual sample size was occasionally 23 or 25. RMSEs were adjusted where necessary to reflect the comparison size of 24 plots.

Estimation methods

Design-based estimates

Where Y represents the total volume of timber in the stand of interest, then the basic design-based expansion estimator is given as

$$ \widehat{Y}=\frac{M}{m}{\displaystyle \sum_{i\in q}{Y}_i}, $$
(3)

where Y i is the volume of timber in plot i, q represents the set of sampled plots, and M and m represent the total number of plots and the number of sampled plots respectively (Cochran 1977).

The stratified sampling method employed a stratified design-based expansion estimator which is given by (Cochran 1977)
$$ \widehat{Y}={\displaystyle \sum_{h=1}^H\frac{M_h}{m_h}}{\displaystyle \sum_{i\in {q}_h}{Y}_i}, $$
(4)

where H represents the total number of strata, and M h and m h represent the total number of plots and the number of sampled plots in stratum h respectively.

In addition to the usual design-based estimator, a regression estimator (model-assisted) was also employed using strata and OV as the auxiliary variables. This estimator may be written as (Särndal et al. 1992)
$$ \widehat{Y}={\displaystyle \sum_{h=1}^H{\displaystyle \sum_{i\in {q}_h}}\frac{M_h}{m_h}}{Y}_i-{\widehat{\boldsymbol{\beta}}}^T\left(\widehat{\boldsymbol{Z}}-\boldsymbol{Z}\right), $$
(5)

where Z represents the design matrix for the model covariates, is a design-based expansion estimator for Z and β is a vector of model parameters who's design-based least squares estimate is given by \( \widehat{\boldsymbol{\beta}} \).

Model-based estimates

The model-based predictions used either OV (single balancing variable) or OV and CC (two balancing variables) as auxiliary data. The model-based estimator for timber volume may be written as (Valliant et al. 2000)

$$ \widehat{Y}={\displaystyle \sum_{i\in q}{Y}_i}+{\displaystyle \sum_{j\notin q}{\widehat{\boldsymbol{\beta}}}^T{\boldsymbol{z}}_j}, $$
(6)

where z j represents the covariate vector for plot j and \( \widehat{\boldsymbol{\beta}} \) is an appropriate model-based estimate for β.

Comparisons between methods

The various sampling methods were compared in terms of the relative bias, and the relative root mean squared error (RMSE%). The relative bias is defined as

$$ RB=\frac{{\displaystyle \sum_k\left({\widehat{Y}}_k-{Y}_k\right)}}{{\displaystyle \sum_k{Y}_k}}, $$
(7)

where k indexes the set of realisations of the simulated plantations, Ŷ k is the estimate of total timber volume for the k'th realisation and Y k is the actual total timber volume.

The relative root mean squared error is defined as
$$ RMSE=\frac{\sqrt{\frac{1}{k}{\displaystyle \sum_k{\left({\widehat{Y}}_k-{Y}_k\right)}^2}}}{\frac{1}{k}{\displaystyle \sum_k{Y}_k}}. $$
(8)
The sampling strategies were also compared in terms of their relative efficiency with respect to the simple random sample. The relative efficiency for any method is calculated by squaring the ratio obtained by dividing the RMSE value for that method by the RMSE value for the design-based unstratified estimate (3). The PLE (probable limits of error; Goulding and Lawrence 1992) was also calculated, where the PLE is half of the 95% confidence interval expressed as a percentage of the mean and was calculated in this study as twice the RMSE
$$ \mathrm{P}\mathrm{L}\mathrm{E} = 2\ *\ \mathrm{RMSE}. $$
(9)

Results

The parameter estimates for model [1] are given in Table 5. The comparisons of survey methods for each of the four surveys are presented in Tables 6, 7, 8, 9. The different sampling strategies in terms of RMSE and bias were compared and the relative bias was found to be very small for all methods. In terms RMSE, all strategies outperformed the simple random sample. In the simulations based on these plantation stands, the performance of simple random sampling is similar to that of a grid sample, which is commonly used in inventory sampling. It could be argued that a grid sample should be more efficient than a simple random sample because it reduces the effect of the spatial variability in the plots. However, when the Nundle data were pre-stratified according to the LiDAR derived variables related to plot mean height and stand density, it was noticeable that the plots within the same stratum were not spatially contiguous which suggests that grid sampling and simple random sampling are likely to perform similarly in these plantations.
Table 5

Parameter estimates for model [ 1 ]

Parameter

Nundle 1977

Nundle 2002

Green Hills (30+ yrs)

Green Hills (1979)

Mean

54.7236

−0.4823

1.1713**

7.8131**

OV

0.0229***

0.0170***

0.0014***

0.0014***

Stratum 2

−12.8711

−4.1180

2.5097***

−2.5224

Stratum 3

47.3760

−6.3374

2.0857

−7.3314

Stratum 4

−56.9808

−1.7643

7.2840***

−2.7101

Stratum 5

43.0417

2.2229

−1.1264

 

Stratum 6

10.3448

−24.734

2.1840**

 

Stratum 7

−31.9621

6.2664

  

Stratum 8

93.5574

−10.8385

  

Stratum 9

113.1250**

−9.4114

  

**p < 0.01 ***p < 0.001.

Table 6

Nundle 1977 AC plots: comparison of methods

Inferential framework

Number of strata

Design variable

Estimation variables

Regression model

Relative bias

RMSE

PLE

Relative efficiency

Design-

-

-

Expansion

-

0.02

6.5

13.0

1.0

based

2

OV

Expansion

-

−0.02

5.3

10.6

1.5

 

3

OV

Expansion

-

0.04

4.0

8.0

2.0

 

4

OV

Expansion

-

0.01

4.4

8.8

2.2

 

6

OV

Expansion

-

0.00

3.8

7.6

2.4

 

2

OV

Regression

Strata + OV

0.06

4.4

8.8

2.2

 

3

OV

Regression

Strata + OV

0.11

4.0

8.0

2.6

 

4

OV

Regression

Strata + OV

0.04

4.0

8.0

2.6

 

6

OV

Regression

Strata + OV

0.04

4.1

8.2

2.5

 

4

OV + CC

Expansion

-

0.02

4.9

9.8

1.8

 

6

OV + CC

Expansion

-

0.05

4.6

9.2

2.0

 

4

OV + CC

Regression

Strata + OV

−0.14

4.4

8.8

2.2

 

6

OV + CC

Regression

Strata + OV

−0.05

5.0

10.0

1.7

Model-

-

OV

Model

OV

−0.08

4.1

8.2

2.5

based

-

OV + CC

Model

OV + CC

−0.12

4.2

8.4

2.4

OV = the LiDAR metric 'occupied volume'; i.e. the sum of all pixel heights per plot.

CC = the LiDAR metric 'canopy cover'; i.e. the % of pixels above 3 m in height.

Expansion = estimator [4]; regression = estimator [5]; model = predictor [6].

Table 7

Nundle 2002 AC plots: comparison of methods

Inferential framework

Number of strata

Design variable

Estimation variables

Regression model

Relative bias

RMSE

PLE

Relative efficiency

Design-

-

-

Expansion

-

0.04

9.1

18.2

1.0

based

2

OV

Expansion

-

−0.05

7.8

15.6

1.4

 

3

OV

Expansion

-

−0.10

8.0

16.0

1.7

 

4

OV

Expansion

-

0.05

6.6

13.2

1.9

 

6

OV

Expansion

-

0.01

6.5

13.0

2.0

 

2

OV

Regression

Strata + OV

−0.04

6.4

12.8

2.0

 

3

OV

Regression

Strata + OV

0.08

6.3

12.6

2.1

 

4

OV

Regression

Strata + OV

0.02

6.3

12.6

2.1

 

6

OV

Regression

Strata + OV

−0.07

6.4

12.8

2.0

 

4

OV + CC

Expansion

-

−0.02

7.5

15.0

1.5

 

6

OV + CC

Expansion

-

−0.04

7.4

14.8

1.5

 

4

OV + CC

Regression

Strata + OV

−0.08

6.3

12.6

2.1

 

6

OV + CC

Regression

Strata + OV

−0.10

6.6

13.2

1.9

Model-

-

OV

Model

OV

−0.12

6.3

12.6

2.1

based

-

OV + CC

Model

OV + CC

−0.14

6.5

13.0

2.0

OV = the LiDAR metric 'occupied volume'; i.e. the sum of all pixel heights per plot.

CC = the LiDAR metric 'canopy cover'; i.e. the % of pixels above 3 m in height.

Expansion = estimator [4]; regression = estimator [5]; model = predictor [6].

Table 8

Green Hills inventory plots (1979 AC): comparison of methods

Inferential framework

Number of strata

Design variable

Estimation variables

Regression model

Relative bias

RMSE

PLE

Relative efficiency

Design-

-

-

Expansion

-

0.04

4.4

8.8

1.0

based

2

OV

Expansion

-

0.06

3.8

7.6

1.3

 

3

OV

Expansion

-

0.02

3.9

7.8

1.3

 

4

OV

Expansion

-

0.01

4.0

8.0

1.2

 

6

OV

Expansion

-

0.10

4.1

8.2

1.2

 

2

OV

Regression

Strata + OV

0.05

3.5

7.0

1.6

 

3

OV

Regression

Strata + OV

0.03

3.8

7.6

1.3

 

4

OV

Regression

Strata + OV

−0.01

3.9

7.8

1.3

 

6

OV

Regression

Strata + OV

0.10

4.1

8.2

1.2

 

4

OV + CC

Expansion

-

0.02

3.8

7.6

1.3

 

6

OV + CC

Expansion

-

−0.01

4.1

8.2

1.2

 

4

OV + CC

Regression

Strata + OV

0.03

3.5

7.0

1.6

 

6

OV + CC

Regression

Strata + OV

−0.01

4.0

8.0

1.2

Model-

-

OV

Model

OV

−0.01

3.5

7.0

1.6

based

-

OV + CC

Model

OV + CC

−0.01

3.5

7.0

1.6

OV = the LiDAR metric 'occupied volume'; i.e. the sum of all pixel heights per plot.

CC = the LiDAR metric 'canopy cover'; i.e. the % of pixels above 3 m in height.

Expansion = estimator [4]; regression = estimator [5]; model = predictor [6].

Table 9

Green Hills research plots (30+ yrs AC): comparison of methods

Inferential framework

Number of strata

Design variable

Estimation variables

Regression model

Relative bias

RMSE

PLE

Relative efficiency

Design-

-

-

Expansion

-

0.01

6.9

13.8

1.0

based

2

OV

Expansion

-

0.00

4.9

9.8

2.0

 

3

OV

Expansion

-

0.02

4.4

8.8

2.4

 

4

OV

Expansion

-

−0.01

4.2

8.4

2.7

 

6

OV

Expansion

-

−0.05

4.1

8.2

3.0

 

2

OV

Regression

Strata + OV

0.00

3.3

6.6

4.4

 

3

OV

Regression

Strata + OV

0.02

3.5

7.0

3.9

 

4

OV

Regression

Strata + OV

0.00

3.6

7.2

3.7

 

6

OV

Regression

Strata + OV

−0.07

3.8

7.6

3.3

 

4

OV + CC

Expansion

-

−0.07

4.9

9.8

2.0

 

6

OV + CC

Expansion

-

−0.04

4.4

8.8

2.4

 

4

OV + CC

Regression

Strata + OV

−0.02

3.3

6.6

4.4

 

6

OV + CC

Regression

Strata + OV

−0.02

3.5

7.0

3.9

Model-

-

OV

Model

OV

−0.01

3.6

7.2

4.4

based

-

OV + CC

Model

OV + CC

−0.05

3.3

6.6

4.4

OV = the LiDAR metric 'occupied volume'; i.e. the sum of all pixel heights per plot.

CC = the LiDAR metric 'canopy cover'; i.e. the % of pixels above 3 m in height.

Expansion = estimator [4]; regression = estimator [5]; model = predictor [6].

With the usual design-based expansion estimator, the number of strata is important. For example, better results were achieved using six rather than two strata in the Nundle and Green Hill research stands (Tables 6, 7 and 9). In addition to this, the number of stratification variables is also important with one single, LiDAR-derived variable (OV) leading to better estimates than two separate variables (OV + CC) for three of the four datasets and nearly equivalent estimates for the fourth.

The relationship between stand volume and OV for the four stands is shown in Figure 2. These relationships form the basis of the model-assisted and model-based methods. With the regression estimator (model-assisted) fewer strata led to better estimates for the Green Hills datasets (Tables 8 and 9) but had minimal effect on the RMSE values for the Nundle datasets (Tables 6 and 7). A possible reason is that the regression models were more effective in the Green Hills stands, thus requiring fewer strata to obtain the same level of efficiency. The preferred option with the Nundle stands was for a single stratification variable rather than two variables but this was not the case for the Green Hills stands.
https://static-content.springer.com/image/art%3A10.1186%2Fs40490-015-0038-7/MediaObjects/40490_2015_38_Fig2_HTML.gif
Figure 2

Mean stand volume vs occupied volume (OV) for the four stands used in the study.

With the model-based approach, the number of auxiliary variables (either one or two) had a minor impact on the prediction errors. Care is needed in interpreting the results for the Nundle stands because there was evidence of multi-collinearity in the auxiliary variables and the addition of a second variable to the model may lead to only minor improvement if the two variables are correlated. In the model-based approach the auxiliary variable/s (i.e. the OV and/or CC) was used both in the design phase, via balanced sampling, and in the regression model. Overall, there is not a substantial difference between the model-based and model-assisted methods, with the simulations slightly favouring the model-based approach (Tables 6, 7, 8, 9). The main difference between these two strategies is the use of stratification in the design-based approach (both for sample selection and as a regression variable) compared to use of balanced sampling in lieu of stratification with the model-based approach.

The efficiencies gained through using a regression model, either in a design-based or model-based framework, are pronounced in most of the stands which were studied. The best relative efficiency, of 4.4, was achieved in the Green Hills research stands (Table 9). A relative efficiency of 4.4 is equivalent to 4.4-fold reduction in sample size for the same precision. With the Nundle stands, the plots on which the simulations were based already utilised LiDAR data though a pre-stratification process (i.e. the initial Dalenius-Hodges stratification approach). With the Green Hills stands the initial stratification was based on management-level rather than LiDAR data and in both stands only one of these strata (the oldest) was selected for simulation. Therefore, the simulated plots display a broader range of auxiliary data in the Nundle stands.

Only in the Green Hills inventory plots is it possible to directly explore total recoverable volume (TRV), which is defined as tree stems assessed for product yield, and which was calculated by the inventory crew. However based on other (unpublished) work over several FCNSW regions, the correlation between TSV and TRV is very high. Therefore, none of the above results were expected to change substantially if TRV were considered instead of TSV. Looking at the sampling schemes and sample sizes currently being used it is obvious that the target of 10% PLE is not being met with most of these data (Tables 6, 7 and 9). The exception is the Green Hills inventory plots (Table 8) where, based on these simulations, a sample of 24 plots would result in a PLE of 8.8%.

Discussion

Several studies in the northern hemisphere have demonstrated the benefits of LiDAR data as a priori auxiliary information for improving the efficiency of field plot designs (e.g. Dalponte et al. 2011; Gobakken et al. 2013; Grafström and Ringvall 2013; Graftröm et al. 2014; Hawbaker et al. 2009; Junttila et al. 2013; Maltamo et al. 2011). There is now a consensus that LiDAR-assisted plot selection should lead to cost savings in operational inventories. More specifically, several recent studies have also investigated the utility of LiDAR data to predict stem volume in stands of planted P. radiata (Chen and Zhu 2013; Gonzàlez-Ferreiro 2012; Stone et al. 2011; Watt and Watt 2013; Watt et al. 2013). However, this study is the first to systematically compare the potential efficiencies gained through a series of LiDAR-assisted sampling designs for this important plantation species. Ultimately, the total number of plots will depend on the variation within the area of interest and how comprehensively the sample describes this variation (Junttila et al. 2008).

In this study, the relative efficiency achieved by the model-based sampling strategy ranged from 1.6 in the Green Hills inventory study area to 4.4 in the Green Hills research study area. It is possible to calculate that between 10 plots (Green Hills research study) and 38 plots (Nundle 2002 AC study) would have been required to achieve the 10% PLE target which is used by FCNSW. This compares with the 46 and 79 plots, respectively that would have been required to meet the same target using a grid-based sampling procedure. Even in the Green Hills inventory study, where the relative efficiency gains were least, the sample could have been reduced from 19 plots to 12 plots while still meeting the 10% PLE target.

For the purpose of this study, a distinction was drawn between the variables used to generate the simulated plot data and those used for the simulated survey designs and estimation. The simulation parameters were estimated from the observed data and the original stratification was based on the LiDAR variables mean height and estimated stocking. This original stratification was included in the models used to estimate the simulation parameters, with the covariance parameters estimated separately within each stratum. The strata that were used to generate the plot data were excluded from the simulated survey design and estimation because the underlying mechanisms which give rise to the plot data are usually unknown to the observer. The primary aim was to determine what efficiencies could be gained from using LiDAR variables even when these are different to those which were used to generate the data.

The plots in the simulated plantation stands were selected in accordance with the sampling method under consideration. The assumptions underlying the design-based approaches were observed in the simulated samples; specifically the plots were selected independently using known selection probabilities within strata. In the model-assisted methods the plots were also randomly selected and the known selection probabilities were employed in the estimates. As for the model-based approach, the models used for prediction were not the same as those used for simulating the plots. This is a common situation in model-based sampling since the underlying population model is usually unknown. Some of the model assumptions, such as constant variance, are not actually correct owing to the way that the data were simulated. Again this is a very common situation in actual surveys since the underlying variance structure is usually unknown. Model deficiencies can sometimes be revealed via a detailed residuals analysis but this is not always conclusive. What is important in these results is that the model-based predictions are still equal to or better than the design-based estimates in terms of the RMSE despite the known model deficiencies (and the associated bias).

Both the auxiliary variables used in this study, OV and CC, are easily derived from LiDAR data (Turner et al. 2011). The LiDAR variables chosen must capture the variability of the stand parameter being surveyed, which in turn is influenced by local topography, soils and silvicultural history. These variables were chosen because they were the best predictors of stand volume in the initial regression modelling of the data. Therefore they were used as auxiliary data, either for stratification and/or within the models, using the model-assisted or model-based methods. Maltamo et al. (2011) used the LiDAR metrics VEG (the proportion of ground echoes vs canopy echoes using a threshold value of 2 m) and H90 (plot level 90 percentile for height based on first returns) as a basis for plot selection. Their VEG metric is a measure of canopy penetration which, although different to OV, is somewhat similar to the CC used in this study. Similarly, Gobakken et al. (2013) used the metric h70f (related to canopy height) and d0f (related to canopy density) to define their strata.

The simulations tend towards favouring the model-based approach and there are other advantages to using this approach such as the increased flexibility associated with model-based methods. For example, spatial prediction, small domain (small area) estimation and variance modelling are all model-based techniques which have no obvious design-based counterparts. Nevertheless, to limit bias it is essential that the model is well specified (e.g. Hansen et al. 1983) and captures the spatial variation presented by the population of trees in a Planning Unit. A common source of variation in P. radiata is stand height and disparity in growth is related to local topography (Saremi et al. 2014), especially in regions that may experience periods of drought (e.g. Álvarez et al. 2013). The determination of attributes associated with planted forests within these regions may benefit more from model-based (or model-assisted) sampling design strategies than plantations with relatively homogeneous stands.

A further advantage of model-based surveys is that they can be used with non-random strategies including balanced sampling. This permits greater flexibility in the choice of ground plots, with greater emphasis on achieving a good range of auxiliary variables as opposed to that which results from a purely random selection. It is a key requirement, however, that the models (which are estimated from the selected plots) are representative of the true models for the study area (Ståhl et al. 2011; Wulder et al. 2012). Therefore, plots which are relatively more accessible to forestry personnel could be used as substitutes for those which are inaccessible, provided they have similar LiDAR metrics. However, this plot selection strategy would need to be implemented carefully, since the integrity of the model needs to be maintained. Specifically the reason for the accessibility needs to be unrelated to the key variable of interest, such as timber volume.

Although LiDAR constitutes an additional cost, the results of this study have demonstrated that plot sampling intensity can be reduced yet still achieve the targeted level of precision for stand volume estimates. Typical softwood plantation inventory costs in Australia average around AU$100 per management plot. This equates to approximately AU$25 per hectare, depending on topography, weed occurrence and abundance, and time of measurement (Chen and Zhu 2012; D. Watt, Planning Manager, FCNSW, pers. comm.). Also, LiDAR survey costs can vary considerably depending on the cost of aircraft mobilisation and flight distance and the specifications of the data being acquired. The net benefits, therefore, need to be confirmed through a cost efficiency analysis taking into account local requirements, conditions and circumstances.

This paper is entirely concerned with sampling at a fixed point in time. The need to measure changes over time, for example to estimate growth rates, requires a sample which is designed to be efficient for this purpose. In most population surveys changes over time are best estimated by keeping the sampling units constant or at least maximising the degree of overlap between successive surveys (see for example Cochran 1977). However, some degree of rotation is usually required to remove sampling units that are no longer representative, or for other reasons (such as harvesting). In repeated surveys it will be necessary to replace some of the initial plots when management or environmental changes have had an adverse impact on the inventory plots and/or changes occur within the plantation such as the establishment of new compartments. If the initial sample was efficient in terms of estimating the initial population, then it is likely that it will also be efficient in terms of estimating changes over time, provided plot rotation is employed when it becomes necessary. It will be necessary to revisit inventory plots periodically, both to measure allometric changes and to recalibrate the regression models. However, the use of LiDAR data for survey design should result in a reduced number of inventory plots and also inform the plot selection process when older plots need to be rotated out of the survey and replaced by newer ones. It will also be necessary to obtain updated LiDAR data as the current data becomes obsolete however the exact requirements in this respect are still to be determined (see for example Junttila et al. 2010).

The model-based approach proved to be the preferred strategy in this study. If using a model-based approach then balanced sampling with no stratification gives a precision which is comparable with, or better than, any of the design-based samples and would normally be the method of choice. If using a design-based approach then a regression estimator using a single auxiliary variable is the preferred option. In the design-based approach the allocation of plots to strata should be in accordance with the Dalenius-Hodges approach, after specifying a minimum number of plots per stratum (a minimum of two plots per stratum is required for variance estimation). Since the intention is to have a small overall sample size the number of strata also needs to be small, at most two or three. The number of plots required to achieve a 10% PLE will depend on the variation in timber volume. However, the high correlation between timber volume and OV means that variation in OV (together with historical or pilot data) could be used to determine the number of plots required.

The model-based strategy uses a widely available package for balanced sampling (“samplecube” Deville (2004)) together with simple linear models to compute the predicted values. The variance estimates, which are not considered in this paper, utilise matrix computations which are available in any mathematical or statistical package, such as R or Matlab. The next phase of this study will be to examine other modelling procedures, in particular the non-parametric techniques of MSN and Random Forests and to determine the optimal design strategy for product volume estimates.

Conclusion

This paper compared the efficiency gains which are achievable using LiDAR data as a priori auxiliary information in P. radiata inventory samples. The efficiency gains observed in the model-based strategy, compared to simple random sampling, were equivalent to reducing the sample size from 46 to 10 plots in the Green Hills study site and from 79 to 38 plots in the Nundle 2002 AC site. Although the simulations favoured the model-based approach, the model-assisted design-based estimators also achieved good levels of precision. Balanced sampling was used in the model-based strategies as an alternative to stratification and proved to be an effective option.

Declarations

Acknowledgements

The authors thank the following people who assisted us in collating the data for this study: Bob Cooper, Tony Brown & Duncan Watt (Forestry Corporation NSW), Matt Nagel & Amrit Kathuria (NSW Dept Primary Industries), Murray Webster (ForeSence Pty Ltd), Hanieh Saremi (University of New England) and Cate MacGregor (University of New England). The LiDAR and inventory plot data were provided by the Forestry Corporation of NSW and this study was supported with funds from Forest & Wood Products Australia. Finally, we are grateful for the comments provided by Huiquan Bi (NSW DPI) and two anonymous reviewers which considerably improved the manuscript.

Authors’ Affiliations

(1)
Trangie Agricultural Research Centre
(2)
NSW Department of Primary Industries, Forest Research
(3)
Remote Census Pty Ltd

References

  1. Álvarez J, Allen HL, Albaugh TJ, Stape JL, Bullock BP, & Song C. (2013). Factors influencing the growth of radiata pine plantations in Chile. Forestry, 86, 13–26. doi:10.1093/forestry/cps072.View ArticleGoogle Scholar
  2. Chen Y, & Zhu X. (2012). Site quality assessment of a Pinus radiata plantation in Victoria, Australia, using LiDAR technology. Southern Forests, 74, 217–227. doi:10.2989/20702620.2012.741767.View ArticleGoogle Scholar
  3. Chen Y, & Zhu X. (2013). An integrated GIS tool for automatic forest inventory estimates of Pinus radiata from LiDAR. GIScience & Remote Sensing, 50, 667–689. doi:10.1080/15481603.2013.866783.Google Scholar
  4. Cochran W. (1977). Sampling Techniques. New York: John Wiley and Sons.Google Scholar
  5. Corona P, & Fattorini L. (2008). Area-based lidar-assisted estimation of forest standing volume. Canadian Journal of Forest Research, 38, 2911–2916. doi:10.1139/X08-122.View ArticleGoogle Scholar
  6. Dalponte M, Martinez C, Rodeghiero M, & Gianelle D. (2011). The role of ground reference data collection in the prediction of stem volume with LiDAR data in mountain areas. ISPRS Journal of Photogrammetry and Remote Sensing, 66, 787–797. doi:10.1016/j.isprsjprs.2011.09.003.View ArticleGoogle Scholar
  7. Deville JC, & Tillé Y. (2004). Efficient balanced sampling: the cube method. Biometrika, 91, 893–912.View ArticleGoogle Scholar
  8. Ene LT, Næsset E, Gobakken T, Gregoire TG, Ståhl G, & Holm S. (2013). A simulation approach for accuracy assessment of two-phase post-stratified estimation in large-area LiDAR biomass surveys. Remote Sensing of Environment, 133, 210–224. Available from http://dx.doi.org/10.1016/j.rse.2013.02.002 (Accessed 28 April 2015).View ArticleGoogle Scholar
  9. Foreman EK. (1991). Survey Sampling Principles. New York: Marcel Dekker, Inc.Google Scholar
  10. Gilabert H, & McDill ME. (2010). Optimizing inventory and yield data collection for forest management planning. Forest Science, 56, 578–591.Google Scholar
  11. Gilmour AR, Thompson R, & Cullis BR. (1995). Average Information REML, an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51, 1440–1450.View ArticleGoogle Scholar
  12. Gobakken T., Korhonen L. & Næsset E. (2013) Laser-assisted selection of field plots for an area-based forest inventory. Silva Fennica, 47(5), article id 943. http://dx.doi.org/10.14214/sf.943 (Accessed 28 April 2015)
  13. González-Ferreiro E, Diéguez-Aranda U, & Miranda D. (2012). Estimation of stand variables in Pinus radiata D.Don plantations using different LiDAR pulse densities. Forestry, 85, 281–292. doi:10.1093/forestry/cps002.View ArticleGoogle Scholar
  14. Goulding CJ, & Lawrence ME. (1992). Inventory practice for managed forests (Forest Research Institute Bulletin 171, p. 52). New Zealand: Rotorua.Google Scholar
  15. Grafström A. & Ringvall A.H. (2013) Improving forest field inventories by using remote sensing data in novel sampling designs. Canadian Journal Forest Research, 43, 1015–1022. dx.doi.org/10.1139/cjfr-2013-0123.
  16. Grafström A., Saarela S. & Ene L.T. (2014) Efficient sampling strategies for forest inventories by spreading the sample in auxiliary space. Canadian Journal of Forest Research, 44, 1156–1164. dx.doi.org/10.1139/cjfr-2014-0202.
  17. Gregoire TG. (1998). Design-based and model-based inference in survey sampling: appreciating the difference. Canadian Journal of Forest Research, 28, 1429–1447.View ArticleGoogle Scholar
  18. Gregoire TG, Ståhl G, Næsset E, Gobakken T, Nelson R, & Holm S. (2011). Model-assisted estimation of biomass in a lidar sample survey in Hedmark count, Norway. Canadian Journal of Forest Research, 41, 83–95. doi:10.1139/X10-195.View ArticleGoogle Scholar
  19. Hansen MH, Madow WG, & Tepping BJ. (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association, 78, 776–793. Available from http://www.jstor.org/stable/2288182 (Accessed 28 April 2015).View ArticleGoogle Scholar
  20. Hawbaker TJ, Keuler NS, Lesak AA, Gobakken T, & Contrucci, K. (2009). Improved estimates of forest vegetation structure and biomass with a lidar-optimized sampling design. Journal of Geophysical Research, 114: G00E04. doi:10.1029/2008jG000870.
  21. Hyyppä J, Hyyppä H, Leckie D, Gougeon F, Yu X, & Maltamo M. (2008). Review of methods of small-footprint airborne laser scanning for extracting forest inventory data in boreal forest. International Journal of Remote Sensing, 29, 1339–1366. doi:10.1080/01431160701736489.View ArticleGoogle Scholar
  22. Hyyppä J, Yu X, Hyyppä H, Vastaranta M, Holopainen M, Kukko A, Kaartinen H, Jaakkola A, Vaaja M, Koskinen J, & Alho P. (2012). Advances in Forest InventoryUsing Airborne Laser Scanning. Remote Sensing, 4, 1190–1207. doi:10.3390/rs4051190.View ArticleGoogle Scholar
  23. Jakubowski M.K., Guo Q. & Kelly M. (2013) Tradeoffs between lidar pulse density and forest measurement accuracy. Remote Sensing of Environment, 130, 245–253. http://dx.doi.org/10.1016/j.rse.2012.11.024 (Accessed 28 April 2015).
  24. Junttila V, Maltamo M, & Kauranne T. (2008). Sparse Bayesian estimation of forest stand characteristics from airborne laser scanning. Forest Science, 54, 543–552.Google Scholar
  25. Junttila V, Kauranne T, & Leppänen V. (2010). Estimation of forest stand parameters from airborne laser scanning using calibrated plot databases. Forest Science, 56, 257–270.Google Scholar
  26. Junttila V, Finley AO, Bradford JB, & Kauranne T. (2013). Strategies for minimizing sample size for use in airborne LiDAR-based forest inventory. Forest Ecology and Management, 292, 75–85. Available from doi:10.1016/j.foreco.2012.12.019 (Accessed 28 April 2015).
  27. Köhl M, Magnussen SS, & Marchetti M. (2006). Sampling methods, remote sensing and GIS multisource forestry inventory (p. 373). Heidelberg: Springer-Verlag.View ArticleGoogle Scholar
  28. Maltamo M, Bollandsås OM, Næsset E, Gobakken T, & Packalén P. (2009a) Different sampling strategies for field training plots in ALS inventory. In: Proceedings of the SilviLaser 2009: The 9th International Conference on Lidar Applications for Assessing Forest Ecosystems. Editors: S. Popescu and K. Zhao. Independent Publisher, USA. ISBN 9781616239978.
  29. Maltamo M, Peuhkurinen J, Malinen J, Vauhkonen J, Packalén P, & Tokola T. (2009b). Predicting tree attributes and quality characteristics. Silva Fennica, 43, 507–521. Available from http://www.metla.fi/silvafennica/full/sf43/sf433507.pdf (Accessed 28 April 2015).
  30. Maltamo M, Bollandsås OM, Næsset E, Gobakken T, & Packalén P. (2011). Different plot selection strategies for field training data in ALS-assisted forest inventory. Forestry, 84, 23–31. doi:10.1093/forestry/cpq039.View ArticleGoogle Scholar
  31. McRoberts RE. (2006). A model-based approach to estimating forest area. Remote Sensing of Environment, 103, 56–66. doi:10.1016/j.rse.2006.03.005.View ArticleGoogle Scholar
  32. McRoberts RE. (2010). Probability- and model-based approaches to inference for proportion forest using satellite imagery and ancillary data. Remote Sensing of Environment, 114, 1017–1025. doi:10.1016/j.rse.2009.12.013.View ArticleGoogle Scholar
  33. McRoberts RE, Gobakken T, & Næsset E. (2012). Post-stratification of forest area and growing stock volume using lidar-based stratifications. Remote Sensing of Environment, 125, 157–166. doi:10.1016/j.rse.2012.07.002.View ArticleGoogle Scholar
  34. McRoberts RE, Næsset E, & Gobakken T. (2013). Inference for lidar-assisted estimation of forest growing stock volume. Remote Sensing of Environment, 128, 268–275. Available from http://dx.doi.org/10.1016/j.rse.2012.10.007 (Accessed 28 April 2015).View ArticleGoogle Scholar
  35. Melville GJ, Welsh AH and Stone C. (2015) Improving the efficiency and precision of tree counts in pine plantations using airborne LiDAR data and flexible-radius plots: model-based and design-based approaches. Journal of Agricultural, Biological and Environmental Statistics 20, 29pp.
  36. Næsset E, Gobakken T, Solberg S, Gregoire TG, Nelson R, Ståhl G, et al. (2011). Model-assisted regional forest biomass estimation using LiDAR and InSAR as auxiliary data: A case study from a boreal forest area. Remote Sensing of Environment, 115, 3599–3614. doi:10.1016/j.rse.2011.08.021.View ArticleGoogle Scholar
  37. Næsset E, Bollandsås OM, Gobakken T, Gregoire TG, & Ståhl G. (2013). Model-assisted estimation of change in forest biomass over an 11 year period in a sample survey supported by airborne LiDAR: A case study with post-stratification to provide ``activity data". Remote Sensing of Environment, 128, 299–314. Available from http://dx.doi.org/10.1016/j.rse.2012.10.008 (Accessed 28 April 2015).View ArticleGoogle Scholar
  38. Rombouts J, Ferguson I, Leech J, & Culvenor D. (2010). An evaluation of the field sampling design of the first operational LiDAR based site quality survey of radiata pine plantations in South Australia. Freiburg, Germany: Proceedings of the 2011 Silvilaser conference, September 14–17, 2010.Google Scholar
  39. Saremi H, Kumar L, Turner R, & Stone C. (2014). Airborne LiDAR derived canopy height model reveals a significant difference in radiata pine (Pinus radiata D.Don) heights based on slope and aspect of sites. Trees, 28, 733–744. doi:10.1007/s00468-014-0985-2.View ArticleGoogle Scholar
  40. Särndal C, Swensson B, & Wretman J. (1992). Model Assisted Survey Sampling. New York: Springer-Verlag.View ArticleGoogle Scholar
  41. Ståhl G, Holm S, Gregoire TG, Gobakken T, Næsset E, & Nelson R. (2011). Model-based inference for biomass estimation in a lidar sample survey in Hedmark County, Norway. Canadian Journal of Forest Research, 41, 96–107. doi:10.1139/X10-161.View ArticleGoogle Scholar
  42. Stone C, Penman T, & Turner R. (2011). Determining an optimal model for processing lidar data at the plot level: results for a Pinus radiata plantation in New South Wales, Australia. New Zealand Journal of Forestry Science, 41, 191–205.Google Scholar
  43. Turner R, Kathuria A, & Stone C. (2011). Building a case for lidar-derived structure stratification for Australian softwood plantations. Hobart, Tasmania, Australia: Proceedings of the SilviLaser 2011 conference, Oct. 16–19, 2011.Google Scholar
  44. Valliant R, Dorfman A, & Royall R. (2000). Finite population sampling and inference: a prediction approach. New York: John Wiley.Google Scholar
  45. van Aardt JAN, Wynne RH, & Oderwald RG. (2006). Forest volume and biomass estimation using small-footprint lidar-distributional parameters on a per-segment basis. Forest Science, 52, 636–648.Google Scholar
  46. Watt P, & Watt MS. (2013). Development of a national model of Pinus radiata stand volume from lidar metrics for New Zealand. International Journal of Remote Sensing, 34, 5892–5904. Available at http://dx.doi.org/10.1080/01431161.2013.798053 (Accessed 28 April 2015).View ArticleGoogle Scholar
  47. Watt MS, Adams T, Marshall H, Pont D, Lee J, Crawley D, & Watt P. (2013). Modelling variation in Pinus radiata stem volume and outerwood stress-velocity from LiDAR metrics. New Zealand Journal of Forestry Science, 43, 1–7. Available from http://www.nzjforestryscience.com/content/43/1/1 (Accessed 28 April 2015).View ArticleGoogle Scholar
  48. White JC, Wulder MA, Varhola A, Vastaranta M, Coops NC, Cook BD, Pitt D, & Woods M. (2013). A best practices guide for generating forest inventory attributes from airborne laser scanning data using an area-based approach. Canadian Forest Service Canadian Wood Fibre Centre Information Report FI-X-010. The Forestry Chronicle, 89(6), 722–723.
  49. Wulder MA, White JC, Nelson RF, Næsset E, Ørka HO, Coops NC, et al. (2012). Lidar sampling for large-area forest characterization: A review. Remote Sensing of Environment, 121, 196–209. doi:10.1016/j.rse.2012.02.001.View ArticleGoogle Scholar
  50. Yu, X, Hyyppä J, Holopainen M, & Vastaranta M. (2010). Comparison of area-based and individual tree-based methods for predicting plot-level forest attributes. Remote Sensing, 2, 1481–1495. doi:10.3390/rs2061481.View ArticleGoogle Scholar

Copyright

© Melville et al.; licensee Springer. 2015

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.