An Integrated Crop Yield Model Using Remote Sensing, Agroclimatic Data and Crop Insurance Data

1. Introduction

This report provides the background, general methods and results on an extended approach in the use of remote sensing, agroclimatic and provincial crop insurance data to model reliable early-season and mid-season crop yield estimates as part of the Crop Reporting Series at Statistics Canada. This work builds on the original modelling process which replaced the crop yield estimation from the September Farm Survey in 2016.

2. General methodology for crop yield modelling

The extended methodology for modelling crop yield was developed and tested on the crops grown in Manitoba. Statistics Canada was able to adapt the existing yield model as a result of an agreement between Statistics Canada and the Manitoba Agricultural Services Corporation (MASC) whereby historical and current year crop insurance data at the parcel level was provided, in confidence, to Statistics Canada to assist in the modelling of current year crop yields.

3. Data sources used in the model

The modelling methodology used three data sources: 1) the coarse resolution satellite data used as part of Statistics Canada's Crop Condition Assessment Program; 2) agroclimatic data, and 3) MASC crop insurance data. A description of the first two data sets as well as the processing and extraction methods can be found in greater detail in the report referenced earlier. Only the changes incorporated in this extended yield model will be highlighted in this report.

3.1 Normalized Difference Vegetation Index

A spectral vegetation index, the Normalized Difference Vegetation Index (NDVI), was used as a surrogate for photosynthetic potential of crops. One main difference between the original yield model and the extended yield model was the utilization of the Moderate Resolution Imaging Spectroradiometer (MODIS) imagery used to calculate the NDVI values. The previous satellite dataset had a pixel resolution of 1 kilometer whereas the MODIS data have a pixel resolution of 250 meters. The greater resolution of the MODIS data provides a 16 fold increase in the number of image pixels compared to the previous satellite imagery used, which is essential for the geographic level required for this methodology. Both of these NDVI datasets are released on a weekly basis throughout the growing season (mid-April to mid-October) via Statistics Canada's Crop Condition Assessment Program (CCAP). MODIS data go back to year 2000.

3.2 Agroclimatic indexes

The station-based daily temperature and precipitation data used in the model were provided by Environment Canada and other partner institutions and were used to generate the climate-based predictors. To form a manageable array of potential crop yield predictors, Agriculture and Agri-Food Canada aggregated the daily agroclimatic indexes into monthly sums and means for the months of May to August and provided the aggregated data to Statistics Canada for use within the yield model.

3.3 Crop Insurance Data

Historical crop insurance data from 2000 – 2018 were provided to Statistics Canada by MASC. This dataset included the seeded area, harvested area, and yield at the parcel level for all crops insured by farm operators within Manitoba. The 2019 dataset included only the seeded area at the parcel level for all crops insured. These data are of excellent quality and aid substantially in the development of the extended yield model.

4. Modelling Survey Yields

4.1 Modelling Methods

The crop yields are estimated through the use of a robust multivariate linear model. The model is constructed using the historical relationships between the yields reported to MASC by the farm operator at the end of the growing season (the dependent variable in the model) and the NDVI and agroclimatic measurements taken at different times during the growing season as well as a temporal variable to account for overall changes in yield throughout the years. Data from the previous ten growing seasons are used in deriving the model.

The unit used in the model is the individual parcel as reported to MASC. The largest such parcels are at the quarter section level, representing 160 acres of land. The model estimates the crop yield for an individual parcel of land. This yield is then weighted according to the amount of seeded land in the parcel as reported at the start of the growing season to MASC by the farm operator to derive an initial estimate of yield at the provincial level. Only parcels with at least 145 acres of a single crop are used in the model. An additional adjustment is then made to the modelled estimate to account for smaller parcels. See the section titled Adjustment to Initial Estimates for more details.

Due to the vast number of available NDVI and agroclimatic readings available, it was unreasonable to include all variables in the model. The selection of the variables to be retained in the model was done using the GLMSELECT procedure in SAS with the LASSO (Least Absolute Shrinkage and Selection Operator) option. With a few exceptions, a minimum of five variables was required for the model and the final set of variables to be retained was determined using the Schwarz Bayesian Information Criterion method.

There are more data points available for more common crops. In the case of the most common crops, individual models were constructed at the sub-provincial ecological level and then aggregated to the provincial level. There are seven such ecological regions in Manitoba. They were derived from clusters of ecodistricts (Terrestrial ecodistricts of Canada). Crop yields are influenced by different factors like climate, physiography and soil types for which these regions are suitable to characterize. For less common crops, the model was constructed at the provincial level. See Appendix A for a listing of each crop and the geographic level at which the model was constructed.

4.2 Adjustment to Initial Estimates

The model was constructed using insured parcels of land where a single crop of at least 145 acres was seeded. This allows an accurate assignment of NDVI to the crop being grown. The results coming directly from the model represent the estimated yield from these parcels. Thus, parcels of land which either were not insured or had less than 145 acres of a single crop were not directly included in the modelled estimates.

In the case of uninsured parcels of land, no adjustment was made to the estimates to account for them. This assumes that the yield from uninsured parcels of land is similar to that from the insured parcels with over 145 acres of a single crop. The amount of uninsured land has been estimated by comparing area information from 2016, 2017 and 2018 MASC files with area estimates from the Statistics Canada Field Crop Reporting Series by individual crop. See Appendix B for the estimates of the percentage of uninsured crops.

In the case of parcels of land which had less than 145 acres of a single crop, the historical crop insurance information can be used to compare the yields by crop from parcels with at least 145 acres to those with less. The estimates coming directly from the model can then be adjusted by the ratio of these two values, weighted by the area seeded in the current year. The ratio was calculated using ten years of observations from historical crop insurance files. See Appendix C for the average ratio of the yield of the two sizes of fields by crop and the standard error of these ratios over the past ten years.

4.3 Comparison of Modelled Crop Yields with Other Yield Indicators

During the development cycle of the model, numerous model parameters were studied. This included the number of years of historical data to use in the model, the level at which the model was constructed (provincial or ecological level), the manner in which the predictor variables were selected for the model, the definition of a parcel of land, the methods for adjusting the values coming from the model to better represent the entire population of crop producers and other factors.

One manner of evaluating the success of the model was to compare the results with other statistics which measure crop yield. For the purpose of this study, two data sources were examined

  • The crop yields reported to MASC by the farm operators at the end of the year
  • The results from Statistics Canada's Field Crop Reporting Series surveys.

Statistics Canada measures crop yields at three points in the growing season

  • Early-season estimates, using a survey which takes place in July, referred to as the July survey
  • Mid-season estimates, using a survey which takes place in early September, referred to as the September survey
  • End of season estimates, using a survey which takes place in late October.

As part of its evaluation, the crop yield model was used to produce both early-season and mid-season crop yield estimates

Nine sets of early-season and mid-season crop yield estimates (representing crop years 2010 to 2018) were produced using the model and compared to the results from the other data sources by crop when possibleFootnote 1. A relative difference measurement, relative to the yield estimate from the crop insurance files was calculated for both the survey and modelled values.

relative differencemethod i=yield estimatemethod i-yield estimatecrop insuranceyield estimatecrop insurance

where method i is either the modelled yield estimate or the survey yield estimate.

Three statistics based on the relative difference were produced for each crop

  • The average relative difference across the nine years
  • The average of the absolute value of the relative difference across the nine years
  • The maximum absolute relative difference across the nine years.

5. Results

The results are presented in Table 1 for early-season estimates and those for mid-season are presented in Table 2.

The September survey estimates only cover the 2010 to 2015 period since the survey was cancelled in 2016.

Table 1: Comparison of the relative differences with the crop insurance yield estimates for early-season yield estimates from the July survey and the model (2010-2018)
    Average Relative Difference Average Absolute Relative Difference Maximum Absolute Relative Difference
Crop 2018 Seeded Area (acres) July survey (%) Modelled values (%) July survey (%) Modelled values (%) July survey (%) Modelled values (%)
Barley 324,000 2.0 1.6 12.5 10.8 35.6 25.9
Buckwheat 5,400 31.4 -10.1 31.4 22.4 52.7 36.9
Canary seed -12.4 -1.0 28.9 18.0 77.1 67.2
Canola (rapeseed) 3,416,000 -9.8 1.2 13.2 9.7 23.9 31.2
Corn for grain 421,000 -12.7 5.5 13.2 10.1 22.0 30.6
Peas, dry 85,000 -7.0 -0.5 12.9 12.3 30.7 22.4
Faba beans 7.7 -1.7 24.7 11.6 51.1 36.2
Flaxseed 37,500 2.6 3.7 12.3 11.7 23.3 37.3
Hemp 11,300 ... -4.0 ... 14.4 ... 28.4
Lentils 2,000 ... 111.3 ... 111.3 ... 164.5
Mustard seed 4,900 ... 33.3 ... 52.2 ... 90.4
Oats 484,900 -7.5 -1.4 11.2 7.8 19.8 15.8
Beans, dry coloured 105,800 -8.8 7.2 10.4 13.9 22.7 32.3
Rye, fall remaining 42,300 0.5 -2.0 15.0 14.8 34.6 24.8
Soybeans 1,890,000 -7.3 2.7 11.2 12.0 17.3 30.1
Sunflower seed 60,000 -6.4 4.0 13.9 14.0 31.6 27.9
Beans, dry white 30,100 -13.2 3.0 17.0 16.8 35.1 32.0
Wheat, durum 1.7 -14.4 38.0 27.0 110.0 47.0
Wheat, Canada Western Red Spring 2,590,000 -12.6 -6.3 12.6 9.7 28.1 27.2
Wheat, other spring 20,900 -20.3 -2.8 25.6 15.8 68.4 31.1
Wheat, Canada Prairie Spring Red and Canada Prairie Spring White 51,000 -6.1 -9.0 18.4 15.3 42.0 33.6
Wheat, winter remaining 41,000 -1.9 2.5 6.1 7.2 10.2 19.8
Table 2: Comparison of the relative differences with the crop insurance yield estimates for mid-season yield estimates from the September survey and the model (2010-2018)Footnote 2Footnote 3
    Average Relative Difference Average Absolute Relative Difference Maximum Absolute Relative Difference
Crop 2018 Seeded Area (acres) September survey (%) Modelled values (%) September survey (%) Modelled values (%) September survey (%) Modelled values (%)
Barley 324,000 0.2 -3.3 6.5 11.1 16.0 22.1
Buckwheat 5,400 ... -5.7 ... 18.5 ... 35.9
Canary seed ... -1.7 ... 17.3 ... 59.2
Canola (rapeseed) 3,416,000 -11.4 -3.3 11.4 10.3 16.3 22.0
Corn for grain 421,000 -15.3 2.4 15.3 11.5 28.6 32.1
Peas, dry 85,000 0.9 -1.2 5.9 12.6 17.1 26.3
Faba beans ... 1.1 ... 9.9 ... 31.7
Flaxseed 37,500 -2.0 3.4 5.9 11.7 9.6 32.8
Hemp 11,300 ... -7.4 ... 11.6 ... 34.2
Lentils 2,000 ... 72.9 ... 89.4 ... 164.5
Mustard seed 4,900 ... 51.3 ... 59.7 ... 127.7
Oats 484,900 -7.5 0.7 7.6 8.3 12.8 14.8
Beans, dry coloured 105,800 ... 4.9 ... 8.5 ... 22.5
Rye, fall remaining 42,300 1.9 -2.3 11.7 17.4 20.3 27.0
Soybeans 1,890,000 -8.4 0.6 8.4 11.9 19.3 24.7
Sunflower seed 60,000 ... 1.0 ... 22.1 ... 41.1
Beans, dry white 30,100 ... -4.9 ... 11.3 ... 33.0
Wheat, durum ... -15.8 ... 25.7 ... 47.0
Wheat, Canada Western Red Spring 2,590,000 -9.3 -8.3 9.3 8.6 17.1 25.2
Wheat, other spring 20,900 -9.3 -3.8 9.3 13.7 17.1 32.5
Wheat, Canada Prairie Spring Red and Canada Prairie Spring White 51,000 -9.3 -4.2 9.3 11.3 17.1 30.9
Wheat, winter remaining 41,000 -0.9 1.6 2.9 9.7 5.9 23.9

Overall, the model compares favourably to the results of the July and September surveys, especially for the more common crops in Manitoba such as canola, spring wheat and soybeans. The average relative differences and average absolute relative differences from the model are, in general, of better quality compared to those from the survey.

The value of the relative difference itself shows the percentage difference between the model and the crop insurance values. On average the relative differences are quite good for many cases, but it is important to note that a good average can consist of a large positive value with a similarly large negative value. The absolute values show that there may be an important difference between the two estimates at any one point in time. The maximum shows that this can be quite large in some situations.

6. Data Quality Indicator – the Coefficient of Variation:

Since the parcel level yield values are estimates from a model, they are subject to error. One indicator that can be used to measure the degree of possible error, and therefore the degree of uncertainty in the estimates is the coefficient of variation or CV. In the case of the yield model, the variability is measured based on the standard error of the individual predicted values i.e. the error in the prediction at the parcel level. Note that the modelled CVs are calculated in a different manner to those from a survey and are not directly comparable. The model CVs can be considered to be a conservative estimate of the true variability, that is, an upper bound. In fact the true CV may be lower. Table 3 shows the average CVs for each crop from the 2010-2018 tests.

Table 3: Average estimated CVs from the model for different crops (2010-2018)
Crop Early-season Estimate Average CV (%) Mid-season Estimate Average CV (%)
Barley 20.0 21.0
Buckwheat 61.5 59.6
Canary seed 33.2 33.3
Canola (rapeseed) 19.4 20.0
Corn for grain 14.3 14.6
Peas, dry 26.0 26.1
Faba beans 35.3 31.3
Flaxseed 25.5 25.5
Hemp 44.5 46.4
Lentils 19.0 24.0
Mustard seed 59.6 48.5
Oats 20.9 20.3
Beans, dry coloured 24.1 24.4
Rye, fall remaining 27.9 27.5
Soybeans 18.5 18.7
Sunflower seed 26.0 27.4
Beans, dry white 21.9 23.0
Wheat, durum 33.3 33.4
Wheat, Canada Western Red Spring 17.8 18.1
Wheat, other spring 18.9 19.0
Wheat, Canada Prairie Spring Red and Canada Prairie Spring White 27.1 25.8
Wheat, winter remaining 17.4 17.6
Note that the CVs do not account for any variability related to the adjustment factor for smaller parcels.

7. Release Criteria

A set of rules were established to determine which modelled yields are of an acceptable level of quality to publish. These rules are based on the success of the robust multivariate linear model and the resulting CV calculated for each modelled estimate. These rules are applied to each crop and differ slightly depending on whether the model was constructed at the provincial or ecological level.

Firstly, it is possible that an estimate may not be generated by the model. This is most likely to occur with rare crops. It may result because there are less than 50 parcels of land in the historical database with which to build the model or because no mathematical solution can be found to run the model.

Secondly, if the CV of the provincial estimate from the model is greater than 35%, the estimate is not published.

Finally, there was an additional rule for crops which are modelled at the ecological level. The provincial estimate for a crop will not be published if the total seeded area from ecological regions that do not meet the previous set of conditions exceeds 10% of the provincial seeded area for the crop. In such cases, the model may be rerun at the provincial level to obtain a provincial estimate.

Appendix A: Geographic Level at which the Crop Yield Models were Constructed

The following table indicates the geographic level at which the crop yield models were constructed. In general, less common crops were modelled at the provincial level while more common crops were modelled at the ecological level.
Crop Geographic Level of Modelling
Barley Ecological regions
Buckwheat Provincial
Canary seed Provincial
Canola (rapeseed) Ecological regions
Corn for grain Ecological regions
Peas, dry Provincial
Faba beans Provincial
Flaxseed Ecological regions
Hemp Provincial
Lentils Provincial
Mustard seed Provincial
Oats Ecological regions
Beans, dry coloured Ecological regions
Rye, fall remaining Provincial
Soybeans Ecological regions
Sunflower seed Provincial
Beans, dry white Ecological regions
Wheat, durum Provincial
Wheat, Canada Western Red Spring Ecological regions
Wheat, other spring Ecological regions
Wheat, Canada Prairie Spring Red and Canada Prairie Spring White Provincial
Wheat, winter remaining Ecological regions

Appendix B: Percentage of Crop Area not insured through Crop Insurance

The following table presents estimates by crop of the percentage of cropland that is not insured. No adjustments are made to the modelled estimates for these uninsured crops. It is assumed that their yield is similar to that coming from insured fields. The percentages were estimated by comparing the area of insured land by crop as reported on the 2016 through 2018 MASC files with the estimated crop area from Statistics Canada's Field Crop Reporting Series.
Crop Estimated Percentage of Uninsured Crop Area (%)Footnote 4
Barley 13.4
Buckwheat 10.2
Canary seed
Canola (rapeseed) 3.1
Corn for grain 8.5
Peas, dry 1.8
Faba beans 9.0
Flaxseed 13.8
Hemp
Lentils 15.8
Mustard seed
Oats 14.8
Beans, dry coloured 8.8
Rye, fall remaining 9.5
Soybeans 1.3
Sunflower seed 10.0
Beans, dry white 8.8
Wheat, durum
Wheat, Canada Western Red Spring 6.0
Wheat, other spring 6.0
Wheat, Canada Prairie Spring Red and Canada Prairie Spring White 6.0
Wheat, winter remaining 15.4

Appendix C: Ratio of the Yields between Small and Large Parcels of Insured Land

The following table presents estimates of the ratio of the yields from smaller fields (less than 145 acres of a single crop within a parcel of land) with those from larger fields (at least 145 acres of a single crop within a parcel of land). A value of less than one indicates that the smaller fields have a lower yield than that from larger fields. These values were calculated using ten years of MASC information and the ratio represents the average of the annual ratio over the ten years. The variance of the estimate is also provided as an indication of the stability of this ratio over time.
Crop Estimated Ratio of Yield of Smaller to Larger Fields Variance
Barley 0.88 0.0007
Buckwheat 1.04 0.0430
Canary seed 0.94 0.0189
Canola (rapeseed) 0.96 0.0003
Corn for grain 0.99 0.0007
Peas, dry 0.90 0.0021
Faba beans 0.92 0.0403
Flaxseed 0.94 0.0042
Hemp 0.98 0.0147
Lentils 1.13 0.5358
Mustard seed 1.35 0.5556
Oats 0.83 0.0025
Beans, dry coloured 1.04 0.0035
Rye, fall remaining 0.92 0.0013
Soybeans 0.97 0.0003
Sunflower seed 0.95 0.0022
Beans, dry white 1.00 0.0045
Wheat, durum 1.01 0.0916
Wheat, Canada Western Red Spring 0.92 0.0003
Wheat, other spring 0.95 0.0053
Wheat, Canada Prairie Spring Red and Canada Prairie Spring White 1.05 0.0295
Wheat, winter remaining 0.93 0.0007