Data quality
About Agriculture–National Household Survey linkage
An important benefit of conducting the Census of Agriculture at the same time as the Census of Population and the National Household Survey is that information from these sources can be linked by means of an automated matching process to create the Agriculture–National Household Survey Linkage database. This database contains all Census of Agriculture variables and most of the variables (such as income, education, occupation, etc.) included on the National Household Survey questionnaire. The Agriculture–National Household Survey Linkage database permits the cross-tabulation of socio-economic characteristics of farm operators and their families (for example, the age, education and income of operators) with the agricultural characteristics of farm operations (for example, farm area, number of animals, farm practices, and so on).
The 2011 Agriculture-National Household Survey linkage database follows the Agriculture–Population linkage databases initially created for the 1971 censuses, and also available for the 1981, 1986, 1991, 1996, 2001 and 2006 censuses. The 2011 database targets farm operators and their families who were identified on the 2011 Census of Agriculture, except those residing in Canada's three territories or in collective dwellings.
Because the Agriculture–National Household Survey Linkage database is an amalgamation of information from two data sources, users are encouraged to refer to the reference material from the National Household Survey and the Census of Agriculture for further information on the data collection, processing and dissemination methods used.
New for 2011
The Agriculture–National Household Survey Linkage database is an amalgamation of information from two data sources. Until 2006, the population information came from the Census of Population's long-form which was a mandatory questionnaire distributed to 20% of the Canadian households. In 2011, the data source was the voluntary National Household Survey which was distributed to approximately 33% of the Canadian households.
The population covered by the Agriculture–National Household Survey Linkage database and the estimates derived from it also changed in two ways in 2011. First, the definition of the farming population changed. In the years prior to 2011, only operators and their families who resided on the farm at any time in the previous 12 months were included in the farming population. In 2011, the on-farm restriction was removed. Operators and their families not residing on a farm are also included. Second, residents of collective dwellings were not eligible to receive the National Household Survey and, thus, are not represented in the Agriculture–National Household Survey Linkage database.
Users should be aware of these changes when doing comparisons of results between the 2011 Agriculture–National Household Survey Linkage database and previous Agriculture-Population Linkage databases.
Sources of Error
In a sample survey like the National Household Survey there can be two types of errors – sampling errors and non-sampling errors. In a census like the Census of Agriculture only non-sampling errors exist.
Sampling error arises from estimating a population characteristic by measuring only a portion of the population rather than the entire population. The error can be controlled by the sample size, sample design and the method of estimation.
Non-sampling errors are errors that are unrelated to sampling. They can include errors in the frame from which the sample is drawn, inadequate collection tools, survey non-response and errors in data capture, editing, coding and other processing steps. During the planning stages, steps were implemented to reduce non-sampling error through questionnaire testing, interviewer training, quality control of data capture and coding as well as many other approaches.
Response Rates
The National Household Survey was a voluntary survey, as opposed to the Census of Population long-form questionnaire used in previous databases for which response was mandatory. As a result there is an important difference in the response rates in 2011 compared to previous years. In 2006 the response rate to the Census of Population long-form was approximately 97%. The table below presents the weighted response rates for the entire National Household Survey and the subset of the population eligible for the Agriculture–National Household Survey Linkage database population in 2011.
Provinces | National Household Survey weighted response rate | Agriculture-National Household Survey response weighted rate |
---|---|---|
% | ||
Canada | 77.2 | 71.4 |
Newfoundland and Labrador | 72.5 | 78.7 |
Prince Edward Island | 70.0 | 70.0 |
Nova Scotia | 74.8 | 75.1 |
New Brunswick | 74.2 | 74.5 |
Quebec | 80.7 | 80.4 |
Ontario | 76.3 | 73.8 |
Manitoba | 76.3 | 63.9 |
Saskatchewan | 73.1 | 65.9 |
Alberta | 75.4 | 67.3 |
British Columbia | 77.1 | 74.8 |
Note: The National Household Survey Canada response rate includes respondents from Canada's three territories, while the Agriculture–National Household Survey rate does not.
There is non-response bias when a survey's non-respondents are different from its respondents. In that case, the higher a survey's non-response is, the greater the risk of non-response bias. The quality of the estimates can be affected if such a bias is present.
Automated matching process
The fundamentals of the Agriculture–National Household Survey automated matching process are simple. A farm operator completes a Census of Agriculture questionnaire as well as a Census of Population questionnaire. The operator may also be selected to complete a National Household Survey questionnaire, distributed to approximately one-third of all households. Data from the Census of Agriculture and Census of Population are linked using information which is common to both questionnaires such as name, sex, birth date and address. Using the link which already exists between the Census of Population and National Household Survey questionnaires, the Agriculture–National Household Survey Linkage database can be formed. The 1991 to 2011 Censuses of Agriculture allowed respondents to report up to three operators per farm, and all farm operators were included in the matching process. With this additional information, the relationship between family members living in the same household and operating the same farm can be analyzed. As well, operators in different households operating the same farm can be included in the analysis.
Sampling and weighting
Because only a sample of the Canadian households was selected to receive the National Household Survey, weights were assigned to the records on the Agriculture–National Household Survey Linkage database in order to represent the entire farming population. The weights were calculated independently within each province. An initial weight was generated for most records1 based on the number of households in the province and the number that responded to the National Household Survey. Then characteristics referred to as "constraints" were identified. These were agricultural and population characteristics of primary importance to data users which were fully enumerated on either the Census of Population or Census of Agriculture. For each province, a method known as ridge regression ensured that in most provinces the Agriculture–National Household Survey database estimates of most of these constraints would be very close to the known population counts. The number of constraints varied from 38 to 50 depending upon the province. At the national level, all of the constraints had discrepancies between sample estimates and population counts of less than 1.0% and 92% of the constraints had discrepancies less than 0.5%. Similar values were observed at the provincial level with the exception of Newfoundland and Labrador. Due to the small number of Ag-NHS records in this province, it was not possible to respect the constraints to the same degree as in the other provinces.
The Agriculture–National Household Survey Linkage database contains agricultural data (farm operations and farm operators) and population data (person, household, census family and economic family). Weights have been calculated at the person level, household level, census family level and economic family level.
For any given geographic area, the weighted population, household, family or farm totals or subtotals may differ from similar estimates presented in previous Census of Agriculture data releases. This is because the Census of Agriculture collected data from all farming operations whereas the estimates from the Agriculture–National Household Survey Linkage database came from a sample. The discrepancies for variables used to define the constraints in the ridge regression weight calculations were described above. The discrepancies for any variables highly correlated with at least one of the variables used to define a constraint will be similar to the discrepancy of that constraint. For other variables, discrepancies will depend on the relationship with the variable used to define a constraint, and could be large if no relationship exists.
Data suppression
Results from the Agriculture–National Household Survey Linkage database may be suppressed for two reasons (1) to protect confidentiality of individual respondent data and (2), to limit the dissemination of data of poor quality (which will subsequently be referred to as data quality). The approaches used are similar to those used in previous Agriculture–Population linkage databases but two additional rules (one for confidentiality and one for data quality) have been added.
Confidentiality is controlled through two rules. Random rounding transforms all estimates of counts to random rounded counts at a base 5 level. Employing this technique, all figures in each table, including totals, are randomly rounded either up or down to a multiple of 5. While providing protection against disclosure, this procedure does not add significant error to the data. The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same count in the same table being rounded up in one execution and rounded down in the next.
There are some variables such as those related to income, which can have highly variable responses and which have a higher risk of revealing information about an individual respondent when certain statistics such as averages are calculated. For this reason only medians are produced for these variables, not averages.
Data quality is controlled through the use of the global non-response rate which is an indicator of data quality which combines complete non-response and partial non-response to the survey. A smaller global non-response rate indicates a lower risk of non-response bias, i.e., a lower risk of lack of accuracy. Geographic areas with a global non-response rate higher than or equal to 50% are suppressed. This is the same threshold that is used for the publication of National Household Survey data. In the case of the Agriculture–National Household Survey Linkage database all provinces have a global non-response rate below the 50% threshold.
Provinces | Global non-response rate (%) |
---|---|
Canada | 36.9 |
Newfoundland and Labrador | 35.7 |
Prince Edward Island | 38.1 |
Nova Scotia | 33.9 |
New Brunswick | 34.4 |
Quebec | 28.0 |
Ontario | 35.1 |
Manitoba | 42.8 |
Saskatchewan | 41.3 |
Alberta | 41.1 |
British Columbia | 36.6 |
Note:
- A small number of records on the Agriculture–Population linkage databases were automatically assigned a weight of one and were not weighted as described here. These are households associated with operations with special characteristics.
- Date modified: