Research Data Centre proposal guide for Survey of Household Spending users
This document has been developed with the goal of aiding researchers with the preparation of their proposals to use Survey of Household Spending (SHS) micro data. This guide provides an overview of considerations that researchers should take into account when formulating their proposals, based on common reasons for proposals requiring revisions.
In addition to this document, researchers are encouraged to consult the User Guide for the Survey of Household Spending for more information on concepts and methods used in the SHS.
1. Sample size
The most common reason researchers are asked to revise their proposals is that the explicit or implicit sample size or sub-population of interest identified in the proposal is too small to address the objectives of the research. This may occur when the research project proposed involves one or more of the following:
- looking at detailed expenditure categories using geographic areas that are smaller than the design level of the survey (i.e. province or Census Metropolitan Area);
- analyzing an expenditure category for which the number of reporting households is low (e.g. child care);
- examining a population sub-group (e.g. low-income households);
- creating sub-groups/classifying data using other variables (such as demographic or household characteristics) in a sample that is already small;
- regression analysis using expenditure variables available only in the diary file (applies to the years 2012 and later), as the sample size for the diary is 50% of the total sample since 2012 (see point 3 below).
Users may wish to pool (combine) multiple years of data in order to obtain a larger sample size.
2. SHS Redesign in 2010
The SHS underwent a major redesign in 2010. Expense categories in the redesigned SHS are similar to those of previous years. However, there are substantial differences between pre-2010 data and those from 2010 onward due to changes in data collection, processing and estimation methods. As a result, caution should be used when comparing SHS data from 2010 and later to those from 2009 and earlier.
In particular, researchers should note that two modes of collection are used for the SHS since 2010 – a questionnaire (administered using a computer-assisted personal interview) and an expenditure diary. The questionnaire is used to collect information on expenditures with recall periods based on the type of expenditure (last month, last three months, last 12 months, last four weeks, or last payment). The expenditure diary is completed for two weeks following the interview by selected households.
As well, beginning in 2010, data collection is continuous from January to December of the survey year. That is, data are collected from 1/12th of the sample each month. As such, reference periods of reported amounts are not the same for every household. For example, for households in the January 2013 sample, “the past 12 months” signifies the period from January 2012 to December 2012, while for households in the December 2013 sample, it refers to the period from December 2012 to November 2013.
With the previous SHS model (prior to 2010), collection took place from January to March for all expenditures made during the previous calendar year.
Comparing data from 2010 onward to those from previous years is more problematic for expenditures collected with a sub-annual recall period under the new design (last month, last three months, last four weeks, or last payment). The issue of comparability is less significant for expenditures collected with a 12-month recall period, since expenditures were reported for a 12-month period under the old survey design. However, caution should still be used when comparing data for these expenditures from 2010 and later to those from 2009 and earlier, due to the move to continuous collection under the new design.
More information on the survey redesign can be found in the Note to Users of Data from the 2010 Survey of Household Spending.
3. Regression analysis involving diary-collected expenditure items
Since the 2010 redesign, some expenditure information is collected through a questionnaire, and other information is collected through a diary that selected households are asked to fill out during a two-week period following the interview. Beginning in 2012, the sample size for the diary is 50% of the total sample.
Since only a sub-sample of the interview respondents are selected to complete the diary, two sets of weights are calculated – one for the interview, and another for the diary. A micro-level analysis that requires variables for expenditures collected through the diary (e.g. gasoline), or variables for spending categories that include sub-categories collected through the diary (e.g. transportation), must be limited to households who responded to both the interview and the diary (i.e. the records in the “diary file” in the RDC). Otherwise, two sets of weights would need to be applied simultaneously, which is not possible using standard statistical software.
Researchers should be aware of whether their analysis needs to be restricted to the sample of households who responded to the diary as this limits the sample size.
Data dictionaries indicating which variables are available in the diary file and in the interview file are available upon request.
4. Annualization of amounts reported for a period of less than 12 months
Collected expenditures with a recall period of less than 12 months are annualized so that all expenditure amounts cover a period of 12 months. Annualized values for categories with sub-annual reference periods (last month, last three months, last four weeks, or two weeks) are not meant to be used for analysis at the micro level.
For example, households who are selected to complete the diary are asked to record their expenses for vehicle fuel during the two weeks following the interview. The total amount spent on vehicle fuel reported by the household over the two-week period is multiplied by an annualization factor, as all reference periods are standardized to a 12-month period. The base annualization factor is 26 (52 weeks/2), and additional adjustments are made to account for any non-responded days in the diary as well as for influential (extreme) cases. The annual expenditure amount is not intended to be (and may not be) representative of an individual household’s expenditure on vehicle fuel over a 12-month period, since the annualization factor does not account for seasonal variation in the household’s spending on vehicle fuel. Annualized expenditure on gasoline may be over-estimated for some households, and under-estimated for others.
Estimated annual vehicle fuel expenditures for a group of households (as opposed to an individual household) account for seasonal variability in spending, because expenditures are collected from 1/12th of the sample each month (that is, since households report their expenditures in different months of the year).
5. Household-level vs. person-level variables
Most expenditure variables are only available at the household level (with the exception of clothing expenditures, income taxes, and personal insurance payments and gifts of money which are collected at the individual level).
Some demographic variables are not available for each individual in the household (e.g. educational attainment is collected only for the household reference person and for their spouse).
Researchers should review the SHS questionnaire to ensure that the survey collects the variables required for their analysis.
6. Reference period of income variables
Starting in 2010, the reference period for income and income tax is the year prior to the survey year. For example, the reference year for income is 2012 for the 2013 SHS. This timing is based on the fact that these values now come mainly from the T1 (Individual Tax Return) administrative data files from the Canada Revenue Agency which are generally available nine to twelve months after the end of the calendar year.
7. Money flow variables
Since 2010, the SHS does not collect money flow variables (e.g. savings). As well, savings rates cannot be calculated under the new design since the reference period for income is the year prior to the survey reference year.
8. Medians and Gini coefficients
Since 2010, medians and Gini coefficients cannot be calculated for expenditures collected using a recall period of less than 12 months. This is because the use of sub-annual recall periods affects the distribution of expenditure variables across households (see point 4 above).
Reference documents:
- Definitions, data sources and methods
- User Guide for the Survey of Household Spending, 2013
- Note to Users of Data from the 2010 Survey of Household Spending (applies to SHS data from 2010 onward)
- SHS questionnaire, 2013
- Data dictionaries are available upon request.
- Date modified: