Integrated Crop Yield Modelling Using Remote Sensing, Agroclimatic Data and Survey Data

1. Introduction

This report provides the background, general methods and results of a project undertaken to investigate the use of remote sensing, agroclimatic and survey data to model reliable crop yield estimates as a preliminary estimate to the November Farm Survey estimates, an occasion of the Crop Reporting Series at Statistics Canada. These estimates are made available before the September Farm Survey estimates are released. The work was completed by the Remote Sensing and Geospatial Analysis Section, Agriculture Division, and by the Business Survey Methods Division at Statistics Canada in collaboration with Agriculture and Agri-Food Canada (AAFC).

2. General methodology for crop yield modelling

A methodology for modelling crop yield was developed and tested on the crops that are typically published at the provincial and national levels by the September Farm Survey, as shown in Table 1. The five provinces listed account for approximately 98% of the agricultural land in Canada, across a diverse range of climate zones and soil types. The crops that account for approximately 85% of the revenue for the 19 crops listed are referred to as the seven major crops.

Table 1. Crops typically published in the results of the September Farm Survey, by province
Table summary
This table displays the results of Table 1. Crops typically published in the results of the September Farm Survey. The information is grouped by Crop type (appearing as row headers), Province (appearing as column headers).
Crop type Province
Quebec Ontario Manitoba Saskatchewan Alberta
7 major crops
Barley X X X X X
Canola X X X X X
Corn for grain X X X    
Durum wheat       X X
Oats X X X X X
Soybeans X X X    
Spring wheat X X X X X
12 additional crops
Canary seed       X  
Chickpeas       X X
Coloured beans   X X    
Fall rye   X X X X
Field peas     X X X
Flaxseed     X X X
Lentils       X  
Mixed grains X X X X X
Mustard seed       X X
Sunflower seed     X    
White beans   X X    
Winter wheat X X X X X
Note: Fodder corn is typically published in the September Farm survey. However, it was not modelled due to a lack of July Farm Survey yield estimates.

The goal of the model was to produce a preliminary estimate of the expected harvest yield of the crops in late summer using information from existing data sources.

3. Data sources used in the model

The modelling methodology used three data sources: 1) the coarse resolution satellite data used as part of Statistics Canada's Crop Condition Assessment Program; 2) Statistics Canada's Crop Reporting Series data, and 3) agroclimatic data for the agricultural regions of Canada.

3.1 Normalized Difference Vegetation Index

Since 1987, Statistics Canada has monitored crop conditions across Canada and the northern United States using the Advanced Very High Resolution Radiometer (AVHRR) sensor aboard the National Oceanic and Atmospheric Administration (NOAA) series of satellites. This series of satellites produces a daily image of the entire Earth's surface at a spatial resolution of one kilometre. A spectral vegetation index, the Normalized Difference Vegetation Index (NDVI), was used as a surrogate for photosynthetic potential. NDVI is the normalized ratio of the Near-Infrared (NIR) to Red (R) reflectance (NDVI = (ρNIR − ρR)/(ρNIR + ρR)) and varies from −1 to 1, with values close to one indicating high vegetation content and values close to zero indicating no vegetation over bare ground. Materials such as water which absorb more radiation in the NIR than visible wavelengths, have a negative NDVI.

The NDVI data were processed on a continuous basis throughout the agricultural growing season (April to October) for the entire land mass of Canada. Statistics Canada has a time series of NDVI data from 1987 to present, which includes years of severe drought and record production. The daily NDVI images were processed into seven-day composites as described by Latifovic et al. (2005) and the methodology was further refined by Statistics Canada to minimize or eliminate NDVI errors introduced by the presence of clouds (Bédard 2010).

Cropland NDVI statistics by census agricultural region (CAR) were computed and stored in a relational database for each weekly NDVI composite. Only NDVI picture elements, or pixels, that geographically coincide with an agriculture land cover database produced by AAFC as part of an annual crop inventory were extracted to generate the mean NDVI value for cropland within each of the CARs. The agriculture land cover file and the associated metadata file produced by AAFC were accessible at www.geobase.ca/geobase/en/data/landcover/index.html.

After the mean NDVI values were computed they were imported as one of the input variable databases to the crop models as three-week moving averages from week 18 to 36 (May to August).

3.2 Survey area and yield data

Statistics Canada's Field Crop Reporting Series surveys were another dataset used in the model. These surveys obtain information on grains and other field crops stored on farms (March, July, September and December Farm Surveys), seeded area (March, June, July, September and November Farm Surveys), harvested area, expected yield and production of field crops (July, September and November Farm Surveys). These data provide accurate and timely estimates of seeding intentions, seeded and harvested area, production, yield and farm stocks of the principal field crops in Canada at the provincial level.

The survey produces results only when the crop is relatively abundant. If the crop is abundant in a province, the yields are available at a lower geographic level (usually corresponding to the CARs). If there is a crop but it is not abundant, survey data are available at the province level only. Some crops are absent or largely absent in a province and do not have survey data available.

For abundant crops, CAR level crop yield estimates from the July and November Farm Surveys from 1987 to present were used as input variables for the models while yield estimates from the September Farm Survey and the November Farm Survey were used to verify the accuracy of the yield model results. For less abundant crops, the survey data were compiled at the provincial level.

3.3 Agroclimatic indexes

The climate data collected during the growing season were the third data source used for modelling crop yields. The station-based daily temperature and precipitation data provided by Environment Canada and other partner institutions were used to generate the climate-based predictors (Chipanshi et. al. 2015).

Average values of the indexes at all stations within the cropland extent of a specific CAR were used to represent the mean agroclimate of that CAR. If a CAR lacked input climate data, stations from neighbouring CARs were used.

To form a manageable array of potential crop yield predictors, AAFC aggregated the daily agroclimatic indexes into monthly sums and means for the months of May to August. Their standard deviations (Std) over the month were also calculated and included in the modelling methodology (Newlands et al. 2014; Chipanshi et al. 2015). The Std value shows how the daily index varies over the one-month period. The larger the Std, the higher the variability of the parameter in that month.

4. Modelling survey yields

The model was selected by first reviewing the existing models, then by assessing the models available in SAS. Modelling was done at the smallest geographic level for which historical survey data were available. Only the five main crop-producing provinces (Quebec, Ontario, Manitoba, Saskatchewan and Alberta) were modelled.

4.1 Review of existing models

A model must be created for each CAR (or for each province in the case of less abundant crops). Each region has 28 years of data (1987 to 2014) and 80 explanatory variables. For its preliminary evaluation, Statistics Canada used a stepwise multiple linear regression, and showed that the optimal number of explanatory variables to be selected for modelling was five (Bédard and Reichert 2013).

An approach used by AAFC is based on the Bayesian and non-Bayesian methods at different steps (Chipanshi et al. 2015). The variable selection step uses a non-Bayesian approach by the least-angle robust regression algorithm, while cross-validating and keeping the variables that minimize the median of absolute errors. Yields are then estimated using a Bayesian approach.

The Bayesian approach is used to estimate yields at the beginning of the season, when data for the current year are not all available, which will not be the case at Statistics Canada, where estimates are done near the end of the growing season. As shown in Section 5, the methodology used by Statistics Canada generates results that are similar to the AAFC approach (identified as the least-angle robust model).

4.2 An alternative approach to SAS stepwise modelling

It is important to take outliers into account when selecting the explanatory variables (Khan et al. 2007) and performing estimation, and therefore to use robust modelling methods when possible. The objective was to find a robust alternative for both selecting the variables and estimating the yields. It was found that there was no robust selection procedure in the SAS software used at StatCan. An alternative was to use non-robust algorithms at the selection step and then to estimate the model in a robust way. The LASSO (Least Absolute Shrinkage and Selection Operator) method was selected from the five variable selection algorithms available in SAS. The MM method was chosen from the robust regression methods available in SAS, since it processes outliers at both the model and explanatory variable levels (Copt et al. 2006).

In the rest of the document the method retained by StatCan will be referred to as the LASSO robust model and the method used by StatCan for preliminary evaluations will be referred to as the stepwise non-robust model.

4.3 Aggregating model yield estimates to the provincial and national levels

The yield model estimates are created at the CAR level for the majority of crops. The CAR level estimates are weighted based on seeded area and aggregated to produce a provincial estimate. For certain crops that are less common in a province, the model estimates are only created at the provincial level. A similar weighting approach was used to produce a national estimate from provincial estimates.

4.4 Model evaluation method

The November Farm Survey estimates are considered the most accurate estimate of yield for a given year, due to the fact that the data are collected after the majority of harvesting is completed and the sample size is the largest of all six of the survey occasions. The results of the September Farm Survey can be considered a preliminary estimate of the November results. Therefore, the goal of the modelled yield is not to replicate the results of the September Farm Survey but rather to obtain a sufficiently accurate yield estimate in advance of the November survey results. Unless otherwise indicated, the analysis in the following sections was based on the November yield estimate as the benchmark for comparison.

The relative difference (presented as a percentage) between the yield estimate of a given method (i.e., September Farm Survey or the model) and the November survey yield estimate was the measure of quality. A negative relative difference indicated that the given yield estimate was smaller than the November survey estimate, while a positive relative difference indicated that the given yield estimate was larger than the November survey estimate.

Relative difference=100*Given estimate-November survey estimateNovember survey estimate

Many of the summary tables were shown in terms of the absolute relative difference to demonstrate the magnitude of the difference between two estimates and did not take into account the direction of the difference. These absolute relative differences were summarized in terms of the median, 75th percentile, 90th percentile and maximum value calculated over the range of years for which estimates were compared.

5. Results of the model evaluations

Two studies were undertaken to evaluate the quality of the models. The first compared the LASSO robust model with the stepwise non-robust model while the second compared the LASSO robust model approach to the least-angle robust model.

5.1 Comparing results between the LASSO robust model and stepwise non-robust model

The LASSO robust model results were compared with results from the stepwise non-robust model by comparing the relative differences with the November survey yields. Results were generated for the seven major crops from 1987 to 2014 inclusive at the national level.

In general, the LASSO robust model, when compared with the November survey yields, produced results with smaller absolute relative differences than the stepwise non-robust model (Table 2). It was retained for the second study.

Table 2. Median, 75th and 90th percentiles of absolute relative difference of the LASSO robust and the stepwise non-robust models with November survey yields at the national level for 1987 to 2014 for seven major crops
Table summary
This table displays the results of Table 2. Median. The information is grouped by Crop (appearing as row headers), Median, 75 percentile and 90 percentile (appearing as column headers).
Crop Median 75th percentile 90th percentile
LASSO robust (%) Stepwise non-robust (%) LASSO robust (%) Stepwise non-robust (%) LASSO robust (%) Stepwise non-robust (%)
Barley 3.9 3.3 5.0 6.0 7.7 8.2
Canola 6.4 5.0 10.7 8.0 14.6 15.0
Corn for grain 4.3 6.6 6.2 8.5 8.8 11.6
Durum wheat 3.8 4.9 6.6 8.0 10.3 10.2
Oats 3.8 7.0 5.8 12.5 8.3 18.6
Soybeans 4.3 7.2 10.0 12.4 16.9 19.0
Spring wheat 3.1 4.0 7.0 6.2 9.1 8.8

5.2 Comparing results of the LASSO robust model with the least-angle robust models

For the yield model estimates, the LASSO robust model using SAS was compared with the least-angle robust model using R statistical language software. Statistics Canada, in collaboration with AAFC, determined that the LASSO robust model produced comparable results to those produced by the least-angle robust model. Table 3 indicates that the median absolute differences in yield over 28 years at the national level between the two models were all close to 1% for six of the seven major crops analyzed, and at 2.4% for soybeans.

Table 3. Median absolute difference between yields from the LASSO robust model and the least-angle robust model, national level, for seven major crops
Table summary
This table displays the results of Table 3. Median absolute difference between yields from the LASSO robust model and the least-angle robust model. The information is grouped by Crop (appearing as row headers), Median absolute difference (%) (appearing as column headers).
Crop Median absolute difference (%)
Barley 0.9
Canola 1.0
Corn for grain 1.4
Durum wheat 1.3
Oats 0.9
Soybeans 2.4
Spring wheat 0.9

Statistics Canada made the decision to adopt the SAS LASSO robust model not only because it produced similar results to the least-angle robust model, but also because SAS is the standard programming tool used at the agency.

6. Comparisons of modelled yields with September survey yield results

The yield estimates produced by the SAS LASSO robust model were compared with the September survey yield in terms of relative differences from the November survey yields. Multiple comparisons were completed to evaluate how the modelled and survey yields performed at national and provincial levels over the long term (1987 to 2014), in a year with normal conditions (2014), and in a year of record production (2013).

6.1 Comparing absolute relative differences with November survey yields at the national level (1987 to 2014)

The series of graphs in Figure 2 show the relative difference of both the September survey estimates and the LASSO robust modelled estimates with the November survey yields, at the national level, for the seven major crops separately from 1987 to 2014.

As can be seen by comparing these seven graphs, there is no consistent pattern when the estimates of the two methods are compared. Neither method is consistently closer to the November survey estimates for any crop. For soybeans and corn for grain, the two methods follow a similar pattern of estimates over the 28 years with regard to how the estimates change from year to year. However, this pattern is not present for the other crops. Additionally, for any given year one method does not consistently perform better for all crops. In general, both methods have comparable relative differences from the November survey estimates. However, the modelled estimates tend to have larger relative differences in cases where an extreme relative difference is observed (e.g., the maximum and minimum relative differences are larger).

One pattern that can be seen is that the September survey results tend to be low when compared with the November survey results (below the x-axis) more often than the model results.

Figure 1. Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops.

Figure 1a Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Barley
Description of Figure 1a – Barley

The title of the graph is "Figure 1a Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Barley."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -15 and ends at 20 with ticks every 5 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -5.939484472 occurring in 2009.
The maximum value is 8.413556905 occurring in 2012.
The title of series 2 is "LASSO robust."
The minimum value is -11.011339104 occurring in 2013.
The maximum value is 17.817838787 occurring in 2012.

Data table for figure 1a – Barley
  September survey LASSO robust
1987 -0.657222747 -0.007022293
1988 -2.657886652 -4.127229315
1989 2.529410207 4.062629525
1990 -3.133665940 -4.458154940
1991 1.299204128 2.319516397
1992 -5.120392359 1.216856405
1993 -0.294141830 7.700899896
1994 0.916298603 6.317220039
1995 -0.391672364 -2.258416911
1996 -0.907261465 2.087273987
1997 0.175122672 2.362656427
1998 0.018781177 1.442208318
1999 -1.370081370 -1.408978232
2000 -0.491287422 0.837053541
2001 4.474420578 4.042157977
2002 -2.366187815 5.868762080
2003 -2.940716615 -6.349190164
2004 -4.172236773 -2.400237686
2005 -2.358809299 -0.381758120
2006 -1.552892957 -2.887912767
2007 6.138332474 4.678941003
2008 -3.338165964 -4.117890774
2009 -5.939484472 -7.629567818
2010 3.186219847 3.800574703
2011 2.397051497 4.345646613
2012 8.413556905 17.817838787
2013 -5.007044967 -11.011339104
2014 -0.488764940 -0.992805729
Figure 1b Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Canola
Description of Figure 1b – Canola

The title of the graph is "Figure 1b Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Canola."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -20 and ends at 30 with ticks every 5 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November Survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -17.43219711 occurring in 2002.
The maximum value is 3.504175875 occurring in 1993.
The title of series 2 is "LASSO robust."
The minimum value is -15.94819082 occurring in 2009.
The maximum value is 26.13127244 occurring in 2012.

Data table for figure 1b – Canola
  September survey LASSO robust
1987 -1.815302973 -6.188729659
1988 1.778937896 -1.669973545
1989 -2.300620056 4.279096287
1990 -4.372016853 -3.18435676
1991 -5.488248924 -4.309587165
1992 -10.02046016 -6.872016226
1993 3.504175875 14.56962413
1994 1.936704224 14.64306917
1995 1.740564375 11.82582959
1996 -6.636743059 6.610025337
1997 -4.365309845 3.593502921
1998 -2.775068109 2.101951787
1999 -1.43914773 0.433288302
2000 -3.573216957 2.995683426
2001 -5.914594898 2.96223475
2002 -17.43219711 -12.96349565
2003 -6.399001273 -3.013788807
2004 -10.45617437 9.488752483
2005 -10.40464573 -8.26563344
2006 -5.563327755 -7.397672441
2007 0.549075653 13.22887041
2008 -11.73160576 -8.780009327
2009 -16.30524691 -15.94819082
2010 -11.40594687 -3.10402625
2011 -8.075060256 0.319817821
2012 0.339588606 26.13127244
2013 -7.660756337 -10.37879558
2014 -6.45174235 -0.86129483
Figure 1c Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Corn for grain
Description of Figure 1c – Corn for grain

The title of the graph is "Figure 1c Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Corn for grain."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -15 and ends at 25 with ticks every 5 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November Survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -12.75834026 occurring in 1998.
The maximum value is 16.09179465 occurring in 1992.
The title of series 2 is "LASSO robust."
The minimum value is -6.566274509 occurring in 1998.
The maximum value is 22.61219224 occurring in 1992.

Data table for figure 1c – Corn for grain
  September survey LASSO robust
1987 -8.671828406 -4.679741077
1988 0.511716452 8.63994787
1989 -6.554459361 -3.207685593
1990 -1.549871899 5.050367212
1991 -6.610483949 -0.488994877
1992 16.09179465 22.61219224
1993 1.442430665 8.982220259
1994 -8.160223396 -1.34540209
1995 -2.958741373 6.121643726
1996 -3.424570346 3.447814591
1997 -0.997445365 7.049871548
1998 -12.75834026 -6.566274509
1999 -7.757630794 4.318137453
2000 10.47455392 20.81074797
2001 -5.707984033 5.404491238
2002 -3.819436821 8.651942832
2003 -5.538361517 1.15264252
2004 -8.316167059 -4.068773034
2005 -10.11529083 -3.050867421
2006 -5.565687396 1.816670428
2007 -9.079971325 -3.1113458
2008 -8.229783352 -0.092455981
2009 -1.535975046 4.284288215
2010 -6.585778363 3.483000082
2011 -5.888662677 -0.13813026
2012 -11.30510916 -1.757662926
2013 -6.213861053 5.479065959
2014 -1.214337689 5.172479903
Figure 1d Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Durum wheat
Description of Figure 1d – Durum wheat

The title of the graph is "Figure 1d Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Durum wheat."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -20 and ends at 20 with ticks every 5 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -13.0334 occurring in 2013.
The maximum value is 0.615541 occurring in 1987.
The title of series 2 is "LASSO robust."
The minimum value is -14.111 occurring in 2013.
The maximum value is 15.92199 occurring in 2002.

Data table for figure 1d – Durum wheat
  September survey LASSO robust
1987 0.615541 -4.69621
1988 -0.01519 9.62181
1989 -2.83925 -2.17119
1990 -4.34551 -7.09696
1991 -5.39904 4.238379
1992 -6.55259 -0.17011
1993 -6.64972 6.766385
1994 -1.7052 1.652586
1995 -3.79919 1.212121
1996 -5.9446 0.057774
1997 -2.07523 1.59485
1998 -2.03351 3.432569
1999 -6.36269 -0.54122
2000 -2.2188 -2.06485
2001 -4.28179 11.6726
2002 -5.14184 15.92199
2003 -6.06771 0.039046
2004 -4.08001 -2.92988
2005 -6.95849 -6.58277
2006 -5.10945 -4.80423
2007 -3.12297 4.640448
2008 -8.20229 -0.93006
2009 -5.7484 -9.72166
2010 -1.36524 6.218933
2011 -5.26638 0.198891
2012 -2.76639 6.045082
2013 -13.0334 -14.111
2014 -7.64554 3.170732
Figure 1e Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Oats
Description of Figure 1e – Oats

The title of the graph is "Figure 1e Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Oats."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -20 and ends at 20 with ticks every 5 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November Survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -12.44373178 occurring in 2004.
The maximum value is 14.80583742 occurring in 1991.
The title of series 2 is "LASSO robust."
The minimum value is -14.24123762 occurring in 2013.
The maximum value is 14.07863242 occurring in 1991.

Data table for figure 1e – Oats
  September survey LASSO robust
1987 -3.573792868 -7.941096757
1988 -4.311855078 -8.99623586
1989 2.834808882 5.77805305
1990 -0.200096332 0.128144902
1991 14.80583742 14.07863242
1992 -3.830921889 0.262725956
1993 -5.259953851 -1.06464003
1994 -3.724484437 4.085965127
1995 -0.857117285 2.47688672
1996 -0.674024533 3.70589263
1997 -0.970886767 3.574453216
1998 -1.291998877 1.236853561
1999 -0.431501943 -1.088683761
2000 -1.769690927 0.561912394
2001 -1.444885132 5.634352269
2002 -4.164273879 1.671227146
2003 -3.744451945 -6.402085232
2004 -12.44373178 -4.455809506
2005 -4.616530553 1.206004176
2006 0.621992368 -3.561077406
2007 4.618098905 4.968902194
2008 -8.281906582 -5.930176325
2009 -5.556229055 -7.365405089
2010 1.193311686 2.27612373
2011 -0.534612999 -0.431212488
2012 1.80506142 3.931244589
2013 -10.84346125 -14.24123762
2014 -4.712562644 -3.816582526
Figure 1f Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Soybeans
Description of Figure 1f – Soybeans

The title of the graph is "Figure 1f Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Soybeans."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -30 and ends at 50 with ticks every 10 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November Survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -17.65518992 occurring in 2012.
The maximum value is 31.47390995 occurring in 2001.
The title of series 2 is "LASSO robust."
The minimum value is -19.27526423 occurring in 1991.
The maximum value is 39.38092709 occurring in 2001.

Data table for Figure 1f – Soybeans
  September survey LASSO robust
1987 -10.79142516 -15.92750624
1988 -2.616742538 9.972394633
1989 -4.771484673 3.688444446
1990 -4.200357936 -4.22515471
1991 -12.43500468 -19.27526423
1992 -0.804350519 -4.928637944
1993 -2.549697433 -1.920283667
1994 -2.809249346 -2.598524354
1995 -6.177835618 -3.23361798
1996 -1.40003895 -0.121077377
1997 -1.839609341 2.212303851
1998 -6.398276333 -4.715261948
1999 -3.014151773 0.565451678
2000 2.140796854 8.182956786
2001 31.47390995 39.38092709
2002 1.223961552 13.15859139
2003 18.71359907 24.46603172
2004 -5.482255251 -6.532680591
2005 -4.336856311 -0.0865886
2006 -7.237417347 -4.396426948
2007 3.681452865 9.950566762
2008 -3.873945782 -1.479388137
2009 1.676338539 3.145303769
2010 -6.801313083 -4.234246692
2011 -7.314676856 -4.208300513
2012 -17.65518992 -13.53097055
2013 -5.593830313 -0.426438233
2014 -0.569432156 6.173212476
Figure 1g Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops - Spring wheat
Description of Figure 1g – Spring wheat

The title of the graph is "Figure 1g Relative difference from November survey yields at the national level, 1987 to 2014, seven major crops – Spring wheat."
This is a line chart.
There are in total 28 categories in the horizontal axis. The vertical axis starts at -20 and ends at 20 with ticks every 5 points.
There are 2 series in this graph.
The vertical axis is "Relative difference from November Survey (%)."
The units of the horizontal axis are years from 1987 to 2014.
The title of series 1 is "September survey."
The minimum value is -12.60585523 occurring in 2009.
The maximum value is 4.116861766 occurring in 2007.
The title of series 2 is "LASSO robust."
The minimum value is -16.83277596 occurring in 2013.
The maximum value is 13.79836794 occurring in 1993.

Data table for figure 1g – Spring wheat
  September survey LASSO robust
1987 0.438755149 -4.608113816
1988 -2.365294418 5.269373623
1989 -3.17182972 -0.719100407
1990 -2.986274922 -7.855321603
1991 -1.79163059 2.460371889
1992 -5.252555347 1.875678904
1993 1.968034242 13.79836794
1994 0.394758501 7.797438953
1995 -4.834423881 -0.901504508
1996 -5.858033463 -0.776666628
1997 -2.32497577 2.286423417
1998 -2.56565658 3.806477535
1999 -2.686920413 -0.99416514
2000 -3.903990529 -1.270051203
2001 -3.26484453 5.131900989
2002 -5.267697933 1.519039148
2003 -7.304765897 -7.552162541
2004 -7.344837843 -0.304415452
2005 -4.283750519 0.060340143
2006 -4.395798171 -3.008719214
2007 4.116861766 6.86110741
2008 -6.213653015 -6.852896882
2009 -12.60585523 -12.10758414
2010 -3.055186387 1.574718883
2011 -4.980634997 0.878794172
2012 -0.734908381 7.577868219
2013 -10.28038135 -16.83277596
2014 -4.97077838 -3.141680431

Table 4 summarizes the graphical information. At the national level, the median absolute relative differences from the November Farm Survey yields for the seven major crops modelled (barley, canola, corn for grain, durum wheat, oats, soybean, and spring wheat) were very similar to those from the September Farm Survey for the period from 1987 to 2014. In both cases, the median absolute relative difference was 4.1%. The median absolute relative difference results were comparable for some of the 12 additional crops, although larger relative differences were seen for crops that have a limited amount of historical data available. For the 12 additional crops, the overall median absolute relative difference of the modelled estimates (4.4%) was similar to the modelled median of the seven major crops but the overall median absolute relative difference of the September survey for the 12 crops (3.0%) was much lower than its median for the seven major crops.

In general, when larger relative differences were observed, the model's relative differences tended to be larger than those of the September survey. The maximum national absolute relative difference from the November Farm Survey yields for the 19 crops modelled was 39.4%, compared with 31.5% for the September Farm Survey.

Table 4. Median, 75th and 90th percentile and maximum of the absolute relative difference between the September survey yields and the LASSO robust modelled yields and the November survey yields at the national level for 19 crops
Table summary
This table displays the results of Table 4. Median. The information is grouped by  Crop (appearing as row headers), Median, 75th percentile, 90th percentile, Maximum and Years of Nov historical data (appearing as column headers).
Crop Median 75th percentile 90th percentile Maximum Years of November historical data
LASSO robust (%) Sept. survey (%) LASSO robust (%) Sept. survey (%) LASSO robust (%) Sept. survey (%) LASSO robust (%) Sept. survey (%)
Barley 3.9 2.4 5.0 3.5 7.7 5.4 17.8 8.4 28
Canola 6.4 5.5 10.7 8.6 14.6 11.5 26.1 17.4 28
Corn for grain 4.3 6.4 6.2 8.4 8.8 10.7 22.6 16.1 28
Durum wheat 3.8 4.7 6.6 6.1 10.3 7.2 15.9 13.0 28
Oats 3.8 3.6 5.8 4.6 8.3 9.1 14.2 14.8 28
Soybeans 4.3 4.3 10.0 6.9 16.9 14.0 39.4 31.5 28
Spring wheat 3.1 4.0 7.0 5.3 9.1 7.3 16.8 12.6 28
Canary seed 7.2 5.6 14.3 13.3 19.2 19.9 20.6 27.6 16
Chickpeas 5.4 8.3 12.9 13.7 22.0 17.3 22.8 23.9 10
Coloured beans 7.9 5.4 11.9 6.8 13.5 11.2 13.9 16.9 7
Fall rye 4.5 2.6 7.6 4.4 10.0 8.3 27.7 10.4 27
Field peas 4.0 2.3 6.1 5.1 11.6 7.2 21.7 19.7 28
Flax 6.0 4.1 10.6 6.8 14.4 9.0 29.6 12.3 28
Lentils 2.8 3.2 6.9 5.0 12.3 6.6 15.4 11.7 22
Mixed grains 2.4 1.7 4.0 2.8 5.9 5.5 11.4 9.7 28
Mustard seed 3.4 4.6 8.8 8.3 13.6 11.4 21.3 13.5 11
Sunflower 15.9 7.7 25.5 16.7 29.9 22.6 35.5 31.1 10
White beans 11.9 5.0 12.9 6.1 15.4 7.3 19.1 8.8 7
Winter wheat 2.2 1.0 4.3 2.1 7.6 3.7 16.1 12.2 28
Overall (7 major crops) 4.1 4.1 6.9 6.4 13.1 10.3 39.4 31.5  
Overall (12 additional crops) 4.4 3.0 8.3 6.0 14.2 11.0 35.5 31.1  
Overall (all crops) 4.2 3.6 7.6 6.2 13.7 10.5 39.4 31.5  

6.2 Comparing absolute relative differences with November survey yields at the provincial level (1987 to 2014)

Similar comparisons were done at the provincial level. For each crop, only provinces that had at least 10% of the national total area for the crop were included in the summary statistics. The median absolute relative difference from the November Farm Survey yields for the seven major crops modelled was 5.1%, compared with 4.4% for the September Farm Survey; the maximum absolute relative differences were 44.5% and 35.5%, respectively (Table 5). For the 12 additional crops the median absolute relative difference at the provincial level for the modelled estimates was 5.6%, compared with 3.7% for the September Farm Survey. Significantly larger overall maximums of 112.2% for the model and 79.3% for the September survey were observed.

Table 5. Median, 75th and 90th percentile and maximum of the absolute relative difference between the September survey yields and the LASSO robust modelled yields and the November survey yields at the provincial level for 19 crops
Table summary
This table displays the results of Table 5. Median. The information is grouped by  Crop (appearing as row headers), Median, 75th percentile, 90th percentile and Maximum (appearing as column headers).
Crop Median 75th percentile 90th percentile Maximum
LASSO robust (%) Sept. survey (%) LASSO robust (%) Sept. survey (%) LASSO robust (%) Sept. survey (%) LASSO robust (%) Sept. survey (%)
Barley 3.6 3.0 7.5 5.0 10.8 8.0 26.4 12.3
Canola 6.0 6.2 13.4 9.9 18.3 12.6 38.7 21.8
Corn for grain 5.5 5.8 7.7 8.9 9.7 12.5 33.1 26.1
Durum wheat 4.6 4.6 7.6 6.3 10.2 8.2 24.5 14.7
Oats 5.0 3.5 8.4 6.9 13.5 10.4 27.8 22.4
Soybeans 7.1 4.6 10.3 8.9 19.5 14.5 43.1 35.5
Spring wheat 5.0 4.2 8.7 6.5 14.0 9.7 44.5 15.9
Canary seed 7.2 5.6 14.3 13.3 19.2 19.9 20.6 27.6
Chickpeas 5.4 8.3 12.9 13.7 22.0 17.3 22.8 23.9
Coloured beans 8.8 7.3 17.2 13.7 24.0 17.8 112.2 21.1
Fall rye 5.9 4.3 13.2 10.5 21.8 14.5 74.8 79.3
Field peas 5.5 3.6 7.6 7.6 13.6 9.9 34.3 42.3
Flax 7.1 5.2 9.7 8.0 14.3 10.1 40.6 15.4
Lentils 2.8 3.2 6.9 5.0 12.3 6.6 15.4 11.7
Mixed grains 2.9 0.5 5.1 3.5 7.5 6.0 11.2 10.6
Mustard seed 8.1 4.7 14.5 11.4 20.8 17.0 33.8 23.5
Sunflower 15.9 7.7 25.5 16.7 29.9 22.6 35.5 31.1
White beans 12.6 5.9 18.8 8.9 21.0 12.3 31.9 18.6
Winter wheat 4.6 1.8 11.7 5.5 17.7 13.9 43.9 39.9
Overall (7 major crops) 5.1 4.4 8.9 7.4 14.7 11.5 44.5 35.5
Overall (12 additional crops) 5.6 3.7 11.6 8.1 18.5 14.1 112.2 79.3
Overall (All crops) 5.3 4.1 9.8 7.7 16.4 12.4 112.2 79.3

6.3 Comparing relative differences with November survey yields at the national level, 2014

There is nothing unique about the year 2014 in terms of the growing conditions throughout the year or the amount of each crop harvested. It is presented as a "typical" year. In 2014, at the national level, four of the seven major crops modelled and four of the 12 additional crops had smaller relative differences than the September Farm Survey when compared with the November Farm Survey results (Table 6).

Table 6. 2014 national yield estimates for the LASSO robust model and survey (September and November), with relative differences with the November survey estimates for 19 crops
Table summary
This table displays the results of Table 6. 2014 national yield estimates for the LASSO robust model and survey (September and November). The information is grouped by Crop (appearing as row headers), November survey, LASSO robust and September survey (appearing as column headers).
Crop November survey LASSO robust September survey
YieldTable 6 note 1Table 6 note 2 YieldTable 6 note 1Table 6 note 2 Relative difference (%) YieldTable 6 note 1Table 6 note 2 Relative difference (%)
Barley 62.4Table 6 note 1 61.8Table 6 note 1 -1.0 62.1Table 6 note 1 -0.5
Canola 34.4Table 6 note 1 34.1Table 6 note 1 -0.9 32.2Table 6 note 1 -6.4
Corn for grain 149.2Table 6 note 1 156.9Table 6 note 1 5.2 147.4Table 6 note 1 -1.2
Durum wheat 41.0Table 6 note 1 42.3Table 6 note 1 3.2 37.9Table 6 note 1 -7.6
Oats 84.1Table 6 note 1 80.9Table 6 note 1 -3.8 80.1Table 6 note 1 -4.8
Soybeans 41.2Table 6 note 1 43.8Table 6 note 1 6.3 41.0Table 6 note 1 -0.5
Spring wheat 45.8Table 6 note 1 44.3Table 6 note 1 -3.1 43.5Table 6 note 1 -5.0
Canary seed 1038.8Table 6 note 2 1034.8Table 6 note 2 -0.4 1074.0Table 6 note 2 3.4
Chickpeas 1770.6Table 6 note 2 1833.1Table 6 note 2 3.5 1780.0Table 6 note 2 0.5
Coloured beans 20.3Table 6 note 1 17.6Table 6 note 1 -13.3 19.3Table 6 note 1 -5.0
Fall rye 38.2Table 6 note 1 38.0Table 6 note 1 -0.5 36.2Table 6 note 1 -5.2
Field peas 34.9Table 6 note 1 36.8Table 6 note 1 5.4 35.0Table 6 note 1 0.4
Flax 22.1Table 6 note 1 24.0Table 6 note 1 8.6 24.1Table 6 note 1 9.0
Lentils 1373.1Table 6 note 2 1421.5Table 6 note 2 3.5 1324.0Table 6 note 2 -3.6
Mixed grains 66.4Table 6 note 1 58.8Table 6 note 1 -11.4 64.8Table 6 note 1 -2.5
Mustard Seed 908.7Table 6 note 2 954.9Table 6 note 2 5.1 883.0Table 6 note 2 -2.8
Sunflower 1775.2Table 6 note 2 1330.0Table 6 note 2 -25.1 1737.0Table 6 note 2 -2.2
White beans 20.4Table 6 note 1 16.5Table 6 note 1 -19.1 19.4Table 6 note 1 -5.0
Winter wheat 64.5Table 6 note 1 67.2Table 6 note 1 4.2 63.1Table 6 note 1 -2.2
Table 6 note 1

Bushels per acre

Return to the first table 6 note 1 referrer

Table 6 note 2

Pounds per acre

Return to the first table 6 note 2 referrer

6.4 Comparing relative differences with November survey yields at the national level, 2013 (a year of record production)

In 2013 —a year of record production for most crops— the model had smaller relative differences than the September Farm Survey compared with the November Farm Survey for two of the seven major crops analyzed and three of the nine additional crops for which comparable 2013 data were available (Table 7).

Table 7. 2013 national yield estimates for the model and survey (September and November), with relative differences with the November survey estimates for 19 crops
Table summary
This table displays the results of Table 7. 2013 national yield estimates for the model and survey (September and November). The information is grouped by Crop (appearing as row headers), November survey, LASSO robust and September survey (appearing as column headers).
Crop November survey LASSO robust September survey
YieldTable 7 note 1Table 7 note 2 YieldTable 7 note 1Table 7 note 2 Relative difference (%) YieldTable 7 note 1Table 7 note 2 Relative difference (%)
Barley 72.0Table 7 note 1 64.0Table 7 note 1 -11.0 68.4Table 7 note 1 -5.0
Canola 40.0Table 7 note 1 35.9Table 7 note 1 -10.4 36.9Table 7 note 1 -7.7
Corn for grain 146.9Table 7 note 1 154.9Table 7 note 1 5.5 137.7Table 7 note 1 -6.2
Durum wheat 48.4Table 7 note 1 41.6Table 7 note 1 -14.1 42.1Table 7 note 1 -13.0
Oats 92.8Table 7 note 1 79.6Table 7 note 1 -14.2 82.7Table 7 note 1 -10.8
Soybeans 43.2Table 7 note 1 43.0Table 7 note 1 -0.4 40.7Table 7 note 1 -5.6
Spring wheat 52.9Table 7 note 1 44.0Table 7 note 1 -16.8 47.5Table 7 note 1 -10.3
Canary seed 1395.1Table 7 note 2 1108.3Table 7 note 2 -20.6 1103.0Table 7 note 2 -20.9
Chickpeas 2093.1Table 7 note 2 1616.5Table 7 note 2 -22.8 1799.0Table 7 note 2 -14.1
Coloured beans -- -- -- -- --
Fall rye -- -- -- -- --
Field peas 43.7Table 7 note 1 38.3Table 7 note 1 -12.4 43.0Table 7 note 1 -1.6
Flax 27.6Table 7 note 1 23.7Table 7 note 1 -14.1 26.5Table 7 note 1 -3.8
Lentils 1816.4Table 7 note 2 1536.7Table 7 note 2 -15.4 1604.0Table 7 note 2 -11.7
Mixed grains 61.7Table 7 note 1 58.6Table 7 note 1 -5.0 62.2Table 7 note 1 0.8
Mustard seed 950.1Table 7 note 2 934.2Table 7 note 2 -1.7 1013.0Table 7 note 2 6.6
Sunflower 1660.5Table 7 note 2 1368.0Table 7 note 2 -17.6 1619.0Table 7 note 2 -2.5
White beans -- -- -- -- --
Winter wheat 63.1Table 7 note 1 58.1Table 7 note 1 -7.9 55.4Table 7 note 1 -12.2
Table 7 note 1

Bushels per acre

Return to the first table 7 note 1 referrer

Table 7 note 2

Pounds per acre

Return to the first table 7 note 2 referrer

7. Publishing provincial and national yield estimates

Modelled yield estimates are produced for crops at the provincial and national levels. A set of rules were established to determine which modelled yields are of an acceptable level of quality to publish. These rules are based both on data availability and the coefficient of variation (CV) calculated for each estimate at the provincial level. These rules are applied to each crop.

7.1 Publication rules for modelled yields

A minimum of 12 years of historical survey yield data for both November and July must be available as well as June survey area estimates and July survey yield estimates for the current year. If these conditions are not met, then a modelled yield estimate will not be produced for that region.

A second rule was established: the provincial estimate for a crop will not be published if the total cultivated area from suppressed regions (based on the previous set of conditions) exceeds 10% of the provincial area for the crop. Similarly, if provincial estimates for a crop were not published, the national-level estimate (of the five provinces considered) will not be published if the total cultivated area for the suppressed provinces exceeds 10% of the national area.

In cases where the estimates for certain provinces were suppressed due to quality, but a national level estimate was still produced, only provincial estimates that were of an acceptable level of quality were used.

Finally, if the CV of the provincial or national estimate from the model was greater than 10%, the estimate was not published at that level. Model based CVs are calculated differently than those for survey estimates, and different thresholds are used to determine quality than those used in the Field Crop Reporting Series.

7.2 Publication simulation for 2014

The rules listed in the preceding subsection were applied during a simulation of the production of modelled yields for 2014. Table 8 shows which crops produced publishable results for each province and at the national level, as well as the percentage of the crop area from regions that were suppressed. The results for 2015 and the years to come may be different from this simulation, given that the application of the publication rules will be repeated each year.

Table 8. Crops with publishable yields during the 2014 simulation at the provincial and national levels
Table summary
This table displays the results of Table 8. Crops with publishable yields during the 2014 simulation at the provincial and national levels. The information is grouped by Region Crop (appearing as row headers), Quebec, Ontario, Manitoba, Saskatchewan, Alberta and National (appearing as column headers).
Region Crop Quebec Ontario Manitoba Saskatchewan Alberta National
Published Supp. (%) Published Supp. (%) Published Supp. (%) Published Supp. (%) Published Supp. (%) Published Supp. (%)
Barley Yes 0 Yes 0 Yes 0 Yes 0 Yes 0 Yes 0
Canola Yes 0 Yes 0 Yes 0 Yes 0 Yes 0 Yes 0
Canary seed Absent N/A Absent N/A Absent N/A Yes 0.8 Absent N/A Yes 0
Chickpeas Absent N/A Absent N/A Absent N/A No 100 Absent N/A No 100
Dry coloured beans Absent N/A No 100 No 100 Absent N/A No 100 No 100
Corn for grain Yes 0.5 Yes 0 Yes 0 Absent N/A No 100 Yes 1.1
Durum wheat Absent N/A Absent N/A Absent N/A Yes 0 Yes 0.5 Yes 0
Fall rye Absent N/A Yes 0 Yes 6.3 Yes 0 Yes 0 Yes 0
Dry peas Absent N/A Absent N/A Yes 0 Yes 0 Yes 0 Yes 0
Flaxseed Absent N/A Absent N/A Yes 0 Yes 0 Yes 5 Yes 0
Lentils Absent N/A Absent N/A Absent N/A Yes 0 Absent N/A 0 0
Mustard seed Absent N/A Absent N/A Absent N/A Yes 0 No 100 No 27.1
Mixed grains Yes 0 Yes 0 Absent N/A Absent N/A No 100 Yes 8.2
Oats Yes 0 Yes 0 Yes 0 Yes 0 Yes 0 Yes 0
Soybeans Yes 1.3 Yes 0 Yes 0 Absent N/A Absent N/A Yes 0
Spring wheat Yes 0 Yes 0 Yes 0 Yes 0 Yes 0 Yes 0
Sunflower seeds Absent N/A Absent N/A No 100 Absent N/A Absent N/A No 100
Dry white beans Absent N/A No 100 No 100 Absent N/A Absent N/A No 100
Winter wheat Yes 0 Yes 0 Yes 2.9 Yes 0 Yes 0 Yes 0
Number of crops published 8 N/A 9 N/A 10 N/A 12 N/A 9 N/A 13 N/A
Note: Supp (%): Percentage of the area for which modelled yields were suppressed. Absent: indicates that the crop is absent or largely absent in this province. N/A: means Not Applicable.

8. Summary

The estimates produced by the SAS LASSO robust model were comparable to those produced by the September survey in terms of relative difference from the November survey estimates for the seven major crops and many of the 12 additional crops published in September. On rare occasions, both the model and the September survey produced extreme relative differences from the November survey estimates, but not necessarily for the same crops/years. These extreme relative differences tended to be larger for the model than for the September survey.

Larger relative differences were observed in the model estimates for crops that have a limited amount of historical data available. Estimates derived from models that were constructed with only a limited number of data points were at risk of being statistically unreliable. Statistics Canada has established three criteria based on the availability of the input data, as well as quality indicators that must be met to ensure the statistical integrity of the estimates and to determine which of the modelled crop yields were of acceptable quality to be published at provincial and national levels. For each year, the yield model estimates for each crop would be evaluated to determine whether their quality is sufficient for publication.

In 2015, modelled yield estimates for crops deemed to have a sufficient level of quality were published as a preliminary estimate to the September Farm Survey estimates. In the longer term, survey managers must determine what is an acceptable level of risk for the published September estimates, and whether the risk of the larger relative differences produced by the model estimates in extreme cases is worth the benefits of eventually replacing the September survey occasion.

9. References

AAFC (circa 2000). http://www.geobase.ca/geobase/en/data/landcover/index.html

Baier, W., Boisvert, J.B., Dyer, J.A., 2000. The versatile soil moisture budget (VSMB) reference manual [computer software], ECORC contribution no. 1553. In:Agriculture and Agri-Food Canada. Eastern Cereal and Oilseed Research Centre, Ottawa, ON, Canada, pp. A1–D4.

Bédard, F. and Reichert, G., 2013. Integrated Crop Yield and Production Forecasting using Remote Sensing and Agri-Climatic data. Analytical Projects Initiatives final report. Remote Sensing and Geospatial Analysis, Agriculture Division, Statistics Canada

Chipanshi, A., Zhang, Y., Kouadio, L., Newlands, N., Davidson, A., Hill, H., Warren, R., Qian, B., Daneshfar, B., Bedard, F. and Reichert, G., 2015. Evaluation of the Integrated Canadian Crop Yield Forecaster (ICCYF) Model for In-season Prediction of Crop Yield across the Canadian Agricultural Landscape. Agricultural and Forest Meteorology, 206:137-150. DOI: http://dx.doi.org/10.1016/j.agrformet.2015.03.007

Copt, S., and Heritier, S., 2006. Robust MM-Estimation and Inference in Mixed Linear Models.Cahiers du département d'économétrie, Faculté des sciences économiques et sociales, Université de Genève

Khan, J. A., Aelst, S. V., and Zamar, R. H., 2007. Robust Model Selection Based on Least Angle Regression. Journal of the American Statistical Association, Vol. 102, No 480, pp. 1289-1299

Latifovic, R., Trishchenko, A.P., Chen J., Park W.B., Khlopenkov, K.V., Fernandes, R., Pouliot, D., Ungureanu, C., Luo, Y., Wang, S., Davidson, A., Cihlar, J., 2005. Generating historical AVHRR 1 km baseline satellite data records over Canada suitable for climate change studies. Canadian Journal of Remote Sensing, vol. 31, no 5, pp 324-346.

Newlands, N.K., Zamar, D., Kouadio, L., Zhang, Y., Chipanshi, A., Potgieter, A., Toure, S., Hill, H.S.J., 2014. An integrated model for improved seasonal forecasting of agricultural crop yield under environmental uncertainty. Front. Environ. Sci. 2, 17, http://dx.doi.org/10.3389/fenvs.2014.00017.

*Reference documents are available in English only