Data quality, concepts and methodology: Methodology and data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

[an error occurred while processing this directive]11-526-s[an error occurred while processing this directive] [an error occurred while processing this directive]

Introduction

This section provides an overview of the underlying methodology of the Households and the Environment Survey (HES) Energy Use supplement, as well as key aspects of the data quality. It will also provide an understanding of the strengths and limitations of the data. The information may be of particular relevance when making comparisons with data from other surveys or sources of information and when drawing conclusions about the data.

Reference period

The reference period of the HES Energy Use supplement is the 2011 calendar year and collection was conducted between the months of January and March 2012. Some questions asked the respondent to respond with respect to “winter”, “summer”, “heating season” or “past 12 months”, while some others asked with respect to 2011.

Energy consumption data were collected for the fourteen months prior when the survey was completed by a household and was processed to reflect the 2011 calendar year.

Target population

The target population consisted of households in Canada excluding households located in Yukon, Northwest Territories and Nunavut, households located on Indian reserves or Crown lands, and households consisting entirely of full-time members of the Canadian Armed Forces. Institutions and households of certain remote regions were also excluded.

Variables measured

The objectives of the Energy Use Supplement were to collect data on the energy use characteristics and energy consumption for occupied dwellings in Canada. The energy use information, coupled with energy consumption data obtained from respondents’ energy bills or obtained directly from energy suppliers can be used to assess the effectiveness of energy efficiency programs. The survey content also covers the following themes:

  1. dwelling characteristics;
  2. household appliances;
  3. electrical devices; and
  4. heating and air conditioning.

Instrument design

Working with Natural Resources Canada, the questionnaire was designed by Statistics Canada in accordance with standard practices. Content was developed considering the data needs of both the project and the larger research and policy communities.

Sampling

The Households and the Environment Survey - Energy Use is a supplement to the Households and the Environment Survey (HES). The HES was administered from October to November 2011 to a sub-sample of the dwellings that were part of the Canadian Community Health Survey (CCHS) Cycle 4.1 between January 1st and June 30th, 2011. Therefore the HES sample design is closely tied to that of the CCHS. All HES respondents were sent a paper questionnaire for the Energy Use supplemental survey.

The following table shows the number of responding dwellings for the 2011 HES – Energy Use supplement.

Data collection

Respondents were first contacted between the months of January and June 2011 and asked to complete the Canadian Community Health Survey, Cycle 4.1. They were then surveyed for the telephone portion of the HES between the months of October and November 2011. Finally, households responding to the telephone portion of the HES were asked to complete a paper questionnaire on energy use. Data collection for the HES - Energy Use supplement was carried out between January and March 2012.

The last step of the survey was to establish contact with the energy suppliers. Residential energy consumption for 2011 was collected directly from the suppliers in cases where the account holder had given their consent to do so.

Data processing

The data were captured using imaging and automated data entry technology. A small proportion of questionnaires, those that could not be read by the optical scanners, were captured using heads-down keying by experienced operators. Questionable zones method with standard quality control measures were used to verify the error rate of the capture operations. For the HES, based on the quality control sample that was selected, it was determined that the overall data capture error rate did not exceed 0.1%.

Editing

The first type of error treated was related to the flow of the questionnaire, where questions which did not apply to the respondent (and should therefore not have been answered) were found to contain answers. In this case a computer edit automatically eliminated superfluous data by following the flow of the questionnaire implied by answers to previous, and in some cases, subsequent questions.

The second type of error treated involved a lack of information in questions which should have been answered. For this type of error, a non-response or “not-stated” code was assigned to the item.

The third type of error treated involved the identification of incoherent entries based on logical relationship between certain questions.

Coding of open-ended questions

A few data items on the questionnaire were reported in an open-ended format. These questions required coding for inclusion on the data file. The open-ended questions related to responses to “other” categories throughout the questionnaire.

Imputation

Imputation is the process that supplies valid values for those variables that have been identified for a change either because of invalid information or because of missing information. The new values are supplied in such a way as to preserve the underlying structure of the data and to ensure that the resulting records will pass all required edits. In other words, the objective is not to reproduce the true microdata values, but rather to establish internally consistent data records that yield good aggregate estimates.

There are three types of non-response. Complete non-response is when the respondent does not provide the minimum set of answers. These records are dropped and accounted for in the weighting process. Item non-response is when the respondent does not provide an answer to one question, but goes on to the next question. These are usually handled using the “not stated” code or are imputed. Finally, partial non-response is when the respondent provides the minimum set of answers but does not finish the interview. These records can be handled as either complete non-response or multiple item non-response.

In the case of the HES - Energy Use supplement, donor imputation was used to fill in missing data for some item non-response and partial non-response.

Weighting and estimation

The principle behind estimation in a probability sample is that each unit in the sample “represents”, besides itself, several other units not in the sample.

The weighting phase is a step which calculates, for each record, what this number is. This weight appears on the microdata file, and must be used to derive meaningful estimates from the survey.

The initial sampling weight was provided to the Households and the Environment Survey by the CCHS and incorporated the probability of selecting the unit in their sample, as well as other adjustments such as the treatment of non-response to the CCHS.

In order to produce the HES Energy Use supplement weights, adjustments to the HES weights were made to account for non-response to the HES Energy Use supplement.

The accuracy of the estimates was assessed using the ratio of the standard error of the survey estimate to the average value of the estimate itself. This measure is called coefficient of variation (CV). This relative measure of sampling error is usually expressed as a percentage (10% instead of 0.1).

Given the complexity of the HES multi-stage survey design and calibration, there is no simple formula that can be used to calculate variance estimates. Therefore, an approximate method was needed. The bootstrap method is used because the sample design and calibration needs to be taken into account when calculating variance estimates.

Quality evaluation

Data were compared to similar HES or Survey of Household Energy Use (SHEU) data from previous surveys to ensure consistency. Household energy use data was also compared to residential energy use data from Manufacturing and Energy Division. 1 

Subject-matter experts confronted the data using other sources as well as by identifying and researching any values that were not consistent with others in the same domain.

Disclosure control

Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

Coverage

The coverage error of the CCHS, of which the HES is a subsample, is estimated at less than 2%.

Response rates and sampling error

The response rate for this survey was 57.0%. 2  Provincial response rates ranged from 48.1% to 64.8%.

The results estimated from the HES Energy Use Supplement are based on a sample of households in Canada. The results obtained from asking the same questions to all Canadian households would differ to some known extent.

The extent of this sampling error is quantified by the coefficient of variation (CV) with the following guidelines:

  1. 16.5% and below: acceptable estimate;
  2. more than 16.5% to 33.3%: marginal estimate requiring cautionary note to users; and
  3. more than 33.3%: unacceptable estimate.

Estimates that do not meet an acceptable level of quality are either flagged for caution or suppressed. CV tables are prepared by Statistics Canada and made available to help users understand the quality of individual estimates.

For example, CVs for the estimates of the proportion of households that had a forced air furnace in 2011 for Canada and the provinces are as follows:

Data comparability to the Households and the Environment Survey

Some data that were collected through the Households and the Environment Survey were included in the Energy Use supplement and are included in this report. However, a household’s response was only included in this report if it had also completed the HES Energy Use supplement. For this reason, the HES had a larger sample than the HES Energy Use supplement. Estimates may therefore differ slightly.

Data presented in this report on some energy-saving practices (for example, programmable thermostat use) collected during the HES Energy Use supplement may differ slightly from data presented in the 2011 Households and the Environment report, 11-526-X, released in March, 2013.

Data comparability to Natural Resource Canada’s Survey of Household Energy Use

Natural Resource Canada’s Survey of Household Energy Use is based on those respondents to the HES Energy Use Supplement who agreed to share their responses with Natural Resource Canada. As not all respondents agreed to share their responses, there may be some differences between the results of the HES Energy Use supplement and the Survey of Household Energy Use.

Next technical note

Date modified: