1. Objective, Uses and Users
1.1. Objectives
The Monthly Wholesale Trade Survey (MWTS) provides information on the performance of the wholesale trade sector and is an important indicator of the health of the Canadian economy. In addition, the business community uses the data to analyse market performance.
1.2. Use
The estimates provide a measure of the health and performance of the wholesale trade sector. Information collected is used to estimate level and monthly trend for wholesale sales and inventories. At the end of each year, the estimates provide a preliminary look at annual wholesale sales and performance.
1.3. Users
A variety of organizations, sector associations, and levels of government make use of the information. Wholesalers can use the survey results to compare their performance against similar types of businesses, as well as for marketing purposes. Wholesale associations are able to monitor industry performance and promote their wholesale industries. Investors can monitor industry growth, which can result in better access to investment capital by wholesalers. Governments are able to understand the role of wholesalers in the economy, which aid in the development of policies and tax incentives. As an important industry in the Canadian economy (5 to 6% of the Gross Domestic Product, depending on the year), governments are able to better determine the overall health of the economy through the use of the estimates in the calculation of the nation’s Gross Domestic Product (GDP).
2. Concepts, Variables and Classifications
2.1. Concepts
Wholesale trade is generally the intermediate step in the distribution of merchandise. The sector comprises establishments primarily engaged in the buying and selling of merchandise and providing logistics, marketing and support services.
Wholesalers are organized to sell merchandise in large quantities to retailers, business and institutional clients. However, some wholesalers, in particular those that supply non-consumer capital goods, sell merchandise in single units to final users. The sector recognizes two main types of wholesalers: wholesale merchants and wholesale agents and brokers.
Wholesale merchants buy and sell merchandise on their own account, that is, they take title to the goods they sell. They generally operate from warehouse or office locations and they may ship from their own inventory or arrange for the shipment of goods directly from the supplier to the client. In addition to the sales of goods, they may provide, or arrange for the provision of, logistics, marketing and support services, such as packaging and labelling, inventory management, shipping, handling of warranty claims, in-store or co-op promotions, and product training. Dealers of machinery and equipment, such as dealers of farm machinery and heavy-duty trucks, also fall within this category. They are known by a variety of trade designation depending on their relationship with suppliers or customers, or the distribution method they employ.
Examples include wholesale merchant, wholesale distributor, drop shipper, rack-jobbers, import-export merchants, buying groups, dealer-owned cooperatives and banner wholesalers. For purposes of industrial classification, wholesale merchants are classified by industry according to the principal lines of commodities sold. A description of each industrial group included in the accompanying statistical data is shown in Appendix IV. As most businesses sell several kinds of commodities, the classification assigned to a business generally reflects either the individual commodity or the commodity group which is the primary source of the establishment’s receipts, or some mixture of commodities which characterizes the establishment’s business.
Wholesale Agents and Brokers buy and sell merchandise owned by others on a fee or commission basis. They do not take title to the goods they buy or sell, and they generally operate at or from an office location. Wholesale agents and brokers are known by a variety of trade designations including import-export agents, wholesale commission agents, wholesale brokers, and manufacturer’s representatives’ ad agents.
2.2. Variables
Sales are defined as the sales of all goods purchased for resale, net of returns and discounts. This includes parts used in generating repair and maintenance revenue, labour revenue from repair and maintenance, sales of goods manufactured as a secondary activity by the wholesaler, and revenue from rental and leasing of office space, other real estate, and goods and equipment. As well, any commission revenue and fees earned from buying and selling merchandise on account of others by wholesale merchants is also included. Other operating revenue such as operating subsidies and grants, shipping, handling, and storing goods for others are excluded.
Inventories are defined as the book value, i.e., the value maintained in the accounting records, of all stock owned at month end and intended for resale. This includes stock in selling outlets, in warehouses, in transit, or on consignment to others. It also includes stock owned within and outside Canada. Inventories held on consignment from others (not owned), and store and office supplies and any other supplies not to be sold are excluded. Trading Location is the physical location(s) in which business activity is conducted in each province and territory, and for which sales are credited or recognized in the financial records of the company. For wholesalers, this would normally be a distribution centre.
Sales in volume: The value of wholesale trade is measured in two ways; including the effects of price change on sales and net of the effects of price change. The first measure is referred to as wholesale trade in current dollars and the latter as wholesale trade in volume. The method of calculating the current dollar estimate is to aggregate the weighted value of sales for all wholesale outlets. The method of calculating the volume estimate is to first adjust the sales values to a base year, using the price indexes, and then sum up the resulting values.
2.3. Classifications
The Monthly Wholesale Trade Survey is based on the definition of wholesale trade under the NAICS (North American Industrial Classification System). NAICS is the agreed upon common framework for the production of comparable statistics by the statistical agencies of Canada, Mexico and the United States. The agreement defines the boundaries of twenty sectors. NAICS is based on a production-oriented, or supply based conceptual framework in that establishments are groups into industries according to similarity in production processes used to produce goods and services.
Estimates appear for 24 industries based on the 2012 North American Industrial Classification System (NAICS) industries. The 24 industries are further aggregated to 7 sub-sectors which correspond exactly to the 3-digit NAICS codes for wholesale trade industries, with the exception of the following: wholesale agents and brokers; and petroleum and oilseed and grain wholesaler-distributors.
Geographically, sales estimates are produced for Canada and each province and territory. Inventory estimates are produced only for Canada as a whole.
3. Coverage and Frames
Statistics Canada’s Business Register (BR) provides the frame for the Monthly Wholesale Trade Survey. The BR is a structured list of businesses engaged in the production of goods and services in Canada. It is a centrally maintained database containing detailed descriptions of most business entities operating within Canada. The BR includes all incorporated businesses, with or without employees. For unincorporated businesses, the BR includes all employer businesses and businesses with no employees with annualized sales that have a Goods and Services Tax (GST) account or annual revenue coming from individual income tax.
The businesses on the BR are represented by a hierarchical structure with four levels, with the statistical enterprise at the top, followed by the statistical company, the statistical establishment and the statistical location. An enterprise can be linked to one or more statistical companies, a statistical company can be linked to one or more statistical establishments, and a statistical establishment to one or more statistical locations.
The target population for the MWTS consists of all statistical establishments on the BR, excluding unincorporated businesses with no employees and with annual sales less than $30,000,.that are classified to the wholesale sector using the North American Industry Classification System (NAICS) (approximately 90,000 establishments). The NAICS code range for wholesale sector is 410000 to 419999. A statistical establishment is the production entity or the smallest grouping of production entities which: produces a homogeneous set of goods or services; does not cross provincial/territorial boundaries; and provides data on the value of output together with the cost of principal intermediate inputs used along with the cost and quantity of labour used to produce the output. The production entity is the physical unit where the business operations are carried out. It must have a civic address and dedicated labour.
The exclusions to the target population are ancillary establishments (producers of services in support of the activity of producing goods and services for the market of more than one establishment within the enterprise, and serves as a cost centre or a discretionary expense centre for which data on all its costs including labour and depreciation can be reported by the business), future establishments, establishments for which economic signals indicate a null or missing revenue, and establishments in the following non-covered NAICS:
- 41112 (oilseed and grain)
- 412 (petroleum products)
- 419 (agents and brokers)
4. Sampling
The MWTS sample consists of 7,500 groups of establishments (clusters) classified to the Wholesale Trade sector selected from the Statistics Canada Business Register. A cluster of establishments is defined as all establishments belonging to a statistical enterprise that are in the same industrial group and geographical region. The MWTS uses a stratified design with simple random sample selection in each stratum. The stratification is done by industrial groups (mainly, but not only four digit level NAICS), and the geographical regions consisting of the provinces and territories. We further stratify the population by size. The size measure is created using a combination of independent survey data and three administrative variables: the annual profiled revenue, the GST sales expressed on an annual basis, and the declared tax revenue (T1 or T2).
The size strata consist of one take-all (census), at most two take-some (partially sampled) strata, and one take-none (non-sampled) stratum. Take-none strata serve to reduce respondent burden by excluding the smaller businesses from the surveyed population. These businesses should represent at most ten percent of total sales. Instead of sending questionnaires to these businesses, the estimates are produced through the use of administrative data.
The sample was allocated optimally in order to reach target coefficients of variation at the national, provincial/territorial, industrial, and industrial groups by province/territory levels. The sample was also inflated to compensate for dead, non-responding, and misclassified units.
MWTS is a repeated survey with maximization of monthly sample overlap. The sample is kept month after month, and every month new units are added (births) to the sample. MWTS births, i.e., new clusters of establishment(s), are identified every month via the BR’s latest universe. They are stratified according to the same criteria as the initial population. A sample of these births is selected according to the sampling fraction of the stratum to which they belong and is added to the monthly sample. Deaths also occur on a monthly basis. A death can be a cluster of establishment(s) that have ceased their activities (out-of-business) or whose major activities are no longer in wholesale trade (out-of-scope). The status of these businesses is updated on the BR using administrative sources and survey feedback, including feedback from the MWTS. Methods to treat dead units and misclassified units are part of the sample and population update procedures.
5. Questionnaire Design
The questionnaire collects monthly data on wholesale sales and the number of trading locations by province or territory and inventories of goods owned and intended for resale from a sample of wholesalers. For the 2004 redesign, most questionnaires were subject to cosmetic changes only, with the exception of the inclusion of Nunavut. The modifications were discussed with stakeholders and the respondents were given an opportunity to comment before the new questionnaire was finalized. If further changes are needed to any of the questionnaires, proposed changes would go through a review committee and a field test with respondents and data users to ensure its relevancy.
6. Response and Non-response
6.1. Response and Non-response
Despite the best efforts of survey managers and operations staff to maximize response in the MWTS, some non-response will occur.
For statistical establishments to be classified as responding, the degree of partial response (where an accurate response is obtained for only some of the questions asked a respondent) must meet a minimum threshold level below which the response would be rejected and considered a unit non-response. In such an instance, the business is classified as not having responded at all.
Non-response has two effects on data: first it introduces bias in estimates when non-respondents differ from respondents in the characteristics measured; and second, it contributes to an increase in the sampling variance of estimates because the effective sample size is reduced from that originally sought.
The degree to which efforts are made to get a response from a non-respondent is based on budget and time constraints, its impact on the overall quality and the risk of non-response bias.
The main method to reduce the impact of non-response at sampling is to inflate the sample size through the use of over-sampling rates that have been determined from similar surveys.
Besides the methods to reduce the impact of non-response at sampling and collection, the non-responses to the survey that do occur are treated through imputation.
In order to measure the amount of non-response that occurs each month various response rates are calculated. For a given reference month, the estimation process is run at least twice (a preliminary and a revised run). Between each run, respondent data can be identified as unusable and imputed values can be corrected through respondent data. As a consequence, response rates are computed following each run of the estimation process.
For the MWTS, two types of rates are calculated (unweighted and weighted). In order to assess the efficiency of the collection process, unweighted response rates are calculated. Weighted rates, using the estimation weight and the value for the variable of interest, assess the quality of estimation. Within each of these types of rates, there are distinct rates for units that are surveyed and for units that are only modeled from administrative data that has been extracted from GST files.
To get a better picture of the success of the collection process, two unweighted rates called the ‘collection results rate’ and the ‘extraction results rate’ are computed. They are computed by dividing the number of respondents by the number of units that we tried to contact or tried to receive extracted data for them. Non-monthly reporters (respondents with special reporting arrangements where they do not report every month but for whom actual data is available in subsequent revisions) are excluded from both the numerator and denominator for the months where no contact is performed.
In summary, the various response rates are calculated as follows:
Weighted rates:
- Survey Response rate (estimation) = Sum of weighted sales of units with response status i / Sum of survey weighted sales
where i = units that have either reported data that will be used in estimation or are converted
refusals, or have reported data that has not yet been resolved for estimation.
- Admin Response rate (estimation) = Sum of weighted sales of units with response status ii / Sum of administrative weighted sales
where ii = units that have data that was extracted from administrative files and are usable for estimation.
- Total Response rate (estimation) = Sum of weighted sales of units with response status i or response status ii / Sum of all weighted sales
Unweighted rates:
- Survey Response rate (collection) = Number of questionnaires with response status iii / Number of questionnaires with response status iv
where iii = units that have either reported data (unresolved, used or not used for estimation) or are converted refusals.
where iv = all of the above plus units that have refused to respond, units that were not contacted and other types of non-respondent units.
- Admin Response rate (extraction) = Number of questionnaires with response status vi / Number of questionnaires with response status vii
where vi = in-scope units that have data (either usable or non-usable) that was extracted from administrative files
where vii = all of the above plus units that have refused to report to the administrative data source, units that were not contacted and other types of non-respondent units.
(% of questionnaire collected over all in-scope questionnaires)
- Collection Results Rate = Number of questionnaires with response status iii / Number of questionnaires with response status viii
where iii = same as iii defined above
where viii = same as iv except for excluded units that were contacted because their response is unavailable for a particular month since they are non-monthly reporters.
- Extraction Results Rate = Number of questionnaires with response status ix / Number of questionnaires with response status vii
where ix = same as vi with the addition of extracted units that have been imputed or were out of scope
where vii = same as vii defined above
(% of questionnaires collected over all questionnaire in-scope we tried to collect)
All the above weighted and unweighted rates are provided at the industrial group, geography and size group level or for any combination of these levels.
Use of Administrative Data:
Managing response burden is an ongoing challenge for Statistics Canada. In an attempt to alleviate response burden and survey costs, especially for smaller businesses, the MWTS has reduced the number of simple establishments in the sample that are surveyed directly and instead derives sales data for these establishments from Goods and Service Tax (GST) files using a statistical model. The model accounts for differences between sales and revenue (reported for GST purposes) as well as for the time lag between the survey reference period and the reference period of the GST file.
Inventories for establishments where sales are GST-based are derived using the MWTS imputation system. The imputation system uses the previous month’s values, the month-to-month and year-to-year changes in similar size establishments which are surveyed.
For more information on the methodology used for modeling sales from administrative data sources, refer to ‘Monthly Wholesale Trade Survey: Use of Administrative Data’ under ‘Documentation’ of the IMDB.
6.2. Methods used to reduce non-response at collection
Significant effort is spent trying to minimize non-response during collection. Methods used, among others, are interviewer techniques such as probing and persuasion, repeated re-scheduling and call-backs to obtain the information, and procedures dealing with how to handle non-compliant (refusal) respondents.
If data are unavailable at the time of collection, a respondent's best estimates are also accepted, and are subsequently revised once the actual data become available. To minimize total non-response for all variables, partial responses are accepted. In addition, questionnaires are customized for the collection of certain variables, such as inventory, so that collection is timed for those months when the data are available.
Finally, to build trust and rapport between the interviewers and respondents, cases are generally assigned to the same interviewer each month. This action establishes a personal relationship between interviewer and respondent, and builds respondent trust.
7. Data Collection and Capture Operations
Collection of the data is performed by Statistics Canada’s Regional Offices. Respondents are sent a questionnaire or are contacted by telephone to obtain their sales and inventory values, as well as to confirm the opening or closing of business trading locations. There is also follow-up of non-response. Collection of the data begins approximately 7 working days after the end of the reference month and continues for the duration of that month.
New entrants to the survey are introduced to the survey via an introductory letter that informs the respondent that a representative of Statistics Canada will be calling. This call is to introduce the respondent to the survey, confirm the respondent's business activity, establish and begin data collection, as well as to answer any questions that the respondent may have.
8. Editing
Data editing is the application of checks to detect missing, invalid or inconsistent entries or to point to data records that are potentially in error. In the survey process for the MWTS, data editing is done at two different time periods.
First of all, editing is done during data collection. Once data are collected via the telephone, or via the receipt of completed mail-in questionnaires, the data are captured using customized data capture applications. All data are subjected to data editing. Edits during data collection are referred to as field edits and generally consist of validity and some simple consistency edits. They are also used to detect mistakes made during the interview by the respondent or the Interviewer and to identify missing information during collection in order to reduce the need for follow-up later on. Another purpose of the field edits is to clean up responses. In the MWTS, the current month’s responses are edited against the respondent’s previous month’s responses and/or the previous year’s responses for the current month.. Field edits are used to identify problems with data collection procedures and the design of the questionnaire, as well as the need for more interviewer training.
Follow-up with respondents occurs to validate potential erroneous data following any failed preliminary edit check of the data. Once validated, the collected data is regularly transmitted to the head office in Ottawa.
Secondly, editing known as statistical editing is also done after data collection and this is more empirical in nature. Statistical editing is run prior to imputation in order to identify the data that will be used as a basis to impute non-respondents. Large outliers that could disrupt a monthly trend are excluded from trend calculations by the statistical edits. It should be noted that adjustments are not made at this stage to correct the reported outliers.
The first step in the statistical editing is to identify which responses will be subjected to the statistical edit rules. Reported data for the current reference month will go through various edit checks.
The first set of edit checks is based on the Hidiroglou-Berthelot method whereby a ratio of the respondent’s current month data over historical (i.e. last month, or same month last year) or administrative data is analyzed. When the respondent’s ratio differs significantly from ratios of respondents who are similar in terms of industrial group and/or geography group, the response is deemed an outlier.
The second set of edits consists of an edit known as the share of market edit. With this method, one is able to edit all respondents even those where historical and auxiliary data is unavailable. The method relies on current month data only. Therefore, within a group of respondents that are similar in terms of industrial group and/or geography, if the weighted contribution of a respondent to the group’s total is too large, it will be flagged as an outlier.
For edit checks based on the Hidiroglou-Berthelot method, data that are flagged as an outlier will not be included in the imputation models (those based on ratios). Also, data that are flagged as outliers in the share of market edit will not be included in the imputation models where means and medians are calculated to impute for responses that have no historical responses.
In conjunction with the statistical editing after data collection of reported data, there is also error detection done on the extracted GST data. Modeled data based on the GST are also subject to an extensive series of processing steps which thoroughly verify each record that is the basis for the model as well as the record being modeled. Edits are performed at a more aggregate level (industry by geography level) to detect records which deviate from the expected range, either by exhibiting large month-to-month change, or differing significantly from the remaining units. All data which fail these edits are subject to manual inspection and possible corrective action.
9. Imputation
Imputation in the MWTS is the process used to assign replacement values for missing data. This is done by assigning values when they are missing on the record being edited to ensure that estimates are of high quality and that a plausible, internal consistency is created. Due to concerns of response burden, cost and timeliness, it is generally impossible to do all follow-ups with the respondents in order to resolve missing responses. Since it is desirable to produce a complete and consistent micro data file, imputation is used to handle the remaining missing cases.
In the MWTS, imputation for missing values can be based on either historical or administrative data. The appropriate method is selected according to a strategy that is based on whether historical data is available, administrative data is available and/or which reference month is being processed.
There are three types of historical imputation methods. The first type is a general trend that uses one historical data source (previous month, data from next month or data from same month previous year). The second type is a regression model where data from previous month and same month previous year are used simultaneously. The third type uses the historical data as a direct replacement value for a non-respondent.
Depending upon the particular reference month, there is an order of preference that exists so that a top quality imputation can result. The historical imputation method that was labelled as the third type above is always the last option in the order for each reference month.
The imputation methods using administrative data are automatically selected when historical information is unavailable for a non-respondent. The administrative data source (annual GST sales) is the basis of these methods. The annual GST sales are used for two types of methods. One is a general trend that will be used for simple structure, e.g. enterprises with only one establishment, and a second type is called median-average that is used for units with a more complex structure.
Finally, it should be noted that inventories in the MWTS where sales are derived from monthly GST data are also imputed by the MWTS imputation systems. The imputed values are calculated using the same imputation methods that are in place for missing data from non-respondents.
10. Estimation
Estimation is a process that approximates unknown population parameters using only the part of the population that is included in a sample. Inferences about these unknown parameters are then made, using the sample data and associated survey design. This stage uses Statistics Canada's Generalized Estimation System (GES.)
For wholesale sales, the population is divided into a survey portion (take-all and take-some strata) and a non-survey portion (take-none stratum). From the sample that is drawn from the survey portion, an estimate for the population is determined through the use of a Horvitz-Thompson estimator where responses for sales are weighted by using the inverses of the inclusion probabilities of the sampled units. Such weights (called sampling weights) can be interpreted as the number of times that each sampled unit should be replicated to represent the entire population. The calculated weighted sales values are summed by domain, to produce the total sales estimates by each industrial group / geographic area combination. A domain is defined as the most recent classification values available from the BR for the unit and the survey reference period. These domains may differ from the original sampling strata because units may have changed size, industrial group or location. Changes in classification are reflected immediately in the estimates and do not accumulate over time. For the non-survey portion, the sales are estimated with statistical models using monthly GST sales.
For wholesale inventories, the sample selected for estimating sales is used to derive an estimate through the use of a Horvitz-Thompson estimator for the survey portion. A sample-based ratio is then used to produce the estimate for the non-survey portion, and the estimate of the total is derived as the sum of the survey and non-survey portion estimates.
For more information on the methodology for modeling sales from administrative data sources (i.e. GST data) which also contributes to the estimates of the survey portion, refer to ‘Monthly Wholesale Trade Survey: Use of Administrative Data’ under ‘Documentation’ of the IMDB.
The measure of precision used for the MWTS to evaluate the quality of a population parameter estimate and to obtain valid inferences is the variance. The variance from the survey portion is derived directly from a stratified simple random sample without replacement.
Sample estimates may differ from the expected value of the estimates. However, since the estimate is based on a probability sample, the variability of the sample estimate with respect to its expected value can be measured. The variance of an estimate is a measure of the precision of the sample estimate and is defined as the average, over all possible samples, of the squared difference of the estimate from its expected value.
11. Revisions and seasonal adjustment
Revisions in the raw data are required to correct known non-sampling errors. These normally include replacing imputed data with reported data, corrections to previously reported data, and estimates for new births that were not known at the time of the original estimates.
Raw data are revised, on a monthly basis, for the month immediately prior to the current reference month being published. That is, when data for December are being published for the first time, there will also be revisions, if necessary, to the raw data for November. In addition, revisions are made once a year, with the initial release of the February data, for all months in the previous year. The purpose is to correct any significant problems that have been found that apply for an extended period. The actual period of revision depends on the nature of the problem identified, but rarely exceeds three years.
Time series contain the elements essential to the description, explanation and forecasting of the behaviour of an economic phenomenon: "They are statistical records of the evolution of economic processes through time.1 " Economic time series such as the Monthly Wholesale Trade Survey can be broken down into five main components: the trend-cycle, seasonality, the trading-day effect, the Easter holiday effect and the irregular component.
The trend represents the long-term change in the series, whereas the cycle represents a smooth, quasi-periodical movement about the trend, showing a succession of growth and decline phases (e.g., the business cycle). These two components—the trend and the cycle—are estimated together, and the trend-cycle reflects the fundamental evolution of the series. The other components reflect short-term transient movements.
The seasonal component represents sub-annual, monthly or quarterly fluctuations that recur more or less regularly from one year to the next. Seasonal variations are caused by the direct and indirect effects of the climatic seasons and institutional factors (attributable to social conventions or administrative rules; e.g., Christmas).
The trading-day component originates from the fact that the relative importance of the days varies systematically within the week and that the number of each day of the week in a given month varies from year to year. This effect is present when activity varies with the day of the week. For instance, Sunday is typically less active than the other days, and the number of Sundays, Mondays, etc., in a given month changes from year to year.
The Easter holiday effect is the variation due to the shift of part of April’s activity to March when Easter falls in March rather than April.
Lastly, the irregular component includes all other more or less erratic fluctuations not taken into account in the preceding components. It is a residual that includes errors of measurement on the variable itself as well as unusual events (e.g., strikes, drought, floods, major power blackout or other unexpected events causing variations in respondents’ activities).
Thus, the latter four components—seasonal, irregular, trading-day and Easter holiday effect—all conceal the fundamental trend-cycle component of the series. Seasonal adjustment (correction of seasonal variation) consists in removing the seasonal, trading-day and Easter holiday effect components from the series, and it thus helps reveal the trend-cycle. While seasonal adjustment permits a better understanding of the underlying trend-cycle of a series, the seasonally adjusted series still contains an irregular component. Slight month-to-month variations in the seasonally adjusted series may be simple irregular movements. To get a better idea of the underlying trend, users should examine several months of the seasonally adjusted series.
Since April 2008, Monthly Wholesale Trade Survey data are seasonally adjusted using the X-12-ARIMA2 software. The technique that is used essentially consists of first correcting the initial series for all sorts of undesirable effects, such as the trading-day and the Easter holiday effects, by a module called regARIMA. These effects are estimated using regression models with ARIMA errors (auto-regressive integrated moving average models). The series can also be extrapolated for at least one year by using the model. Subsequently, the raw series—pre-adjusted and extrapolated if applicable— is seasonally adjusted by the X-11 method.
The X-11 method is used for analysing monthly and quarterly series. It is based on an iterative principle applied in estimating the different components, with estimation being done at each stage using adequate moving averages3. The moving averages used to estimate the main components—the trend and seasonality—are primarily smoothing tools designed to eliminate an undesirable component from the series. Since moving averages react poorly to the presence of atypical values, the X-11 method includes a tool for detecting and correcting atypical points. This tool is used to clean up the series during the seasonal adjustment. Outlying data points can also be detected and corrected in advance, within the regARIMA module.
Lastly, the annual totals of the seasonally adjusted series are forced to the annual totals of the original series. Unfortunately, seasonal adjustment removes the sub-annual additivity of a system of series; small discrepancies can be observed between the sum of seasonally adjusted series and the direct seasonal adjustment of their total. To insure or restore additivity in a system of series, a reconciliation process is applied or indirect seasonal adjustment is used, i.e. the seasonal adjustment of a total is derived by the summation of the individually seasonally adjusted series.
12. Data Quality Evaluation
The methodology of this survey has been designed to control errors and to reduce their potential effects on estimates. However, the survey results remain subject to errors, of which sampling error is only one component of the total survey error.
Sampling error results when observations are made only on a sample and not on the entire population. All other errors arising from the various phases of a survey are referred to as non-sampling errors. For example, these types of errors can occur when a respondent provides incorrect information or does not answer certain questions; when a unit in the target population is omitted or covered more than once; when GST data for records being modeled for a particular month are not representative of the actual record for various reasons; when a unit that is out of scope for the survey is included by mistake or when errors occur in data processing, such as coding or capture errors.
Prior to publication, combined survey results are analyzed for comparability; in general, this includes a detailed review of individual responses (especially for large businesses), general economic conditions and historical trends.
A common measure of data quality for surveys is the coefficient of variation (CV). The coefficient of variation, defined as the standard error divided by the sample estimate, is a measure of precision in relative terms. Since the coefficient of variation is calculated from responses of individual units, it also measures some non-sampling errors.
The formula used to calculate coefficients of variation (CV) as percentages is:
CV(X) = (S(X) / X) x 100%
where X denotes the estimate and S(X) denotes the standard error of X.
Confidence intervals can be constructed around the estimates using the estimate and the CV. Thus, for our sample, it is possible to state with a given level of confidence that the expected value will fall within the confidence interval constructed around the estimate. For example, if an estimate of $12,000,000 has a CV of 2%, the standard error will be $240,000 (the estimate multiplied by the CV). It can be stated with 68% confidence that the expected values will fall within the interval whose length equals the standard deviation about the estimate, i.e. between $11,760,000 and $12,240,000. Alternatively, it can be stated with 95% confidence that the expected value will fall within the interval whose length equals two standard deviations about the estimate, i.e. between $11,520,000 and $12,480,000.
Finally, due to the small contribution of the non-survey portion to the total estimates, bias in the non-survey portion has a negligible impact on the CVs. Therefore, the CV from the survey portion is used for the total estimate that is the summation of estimates from the surveyed and non-surveyed portions.
13. Disclosure Control
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentially rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure or identifiable data.
Confidentiality analysis includes the detection of possible “direct disclosure”, which occurs when the value in a tabulation cell is composed of a few respondents or when the cell is dominated by a few companies.
Notes
-
A Note on the Seasonal adjustment of Economic Time Series», Canadian Statistical Review, August 1974.
-
For more information, see X-12-ARIMA Reference Manual Version 0.3 (2007), U.S. Census Bureau.
-
Ladiray, D. and Quenneville, B. (2001). Seasonal Adjustment with the X-11 Method. New York: Springer-Verlag, Lecture Notes in Statistics no. 158.