- Introduction
- Policy
- Scope
- The standards and guidelines on the documentation of data quality and methodology
- Appendix: Examples of mandatory summary documentation in standardized form
Introduction
Statistics Canada, as a professional agency in charge of producing official statistics, has the responsibility to inform users of the concepts and methodology used in collecting, processing and analysing its data, of the accuracy of these data, and of any other features that affect their quality or "fitness for use".
Data users first must be able to verify that the conceptual framework and definitions that would satisfy their particular data needs are the same as, or sufficiently close to those employed in collecting and processing the data. Users then need to be able to assess the degree to which the accuracy of the data and other quality factors are consistent with their intended use or interpretation.
There are several dimensions to the concept of quality, and the assessment of data quality or "fitness for use" is a complex undertaking. The full scope of potential uses of the data cannot always be anticipated and not every aspect of quality can be assessed in every context. In particular, data are subject to many potential sources of error and, under the present state of knowledge, comprehensive measurement of data accuracy is rarely possible. Thus there are clear limitations to the provision of measures of accuracy to users, and a rigid requirement for comprehensive measurement and assessment of data quality for all Bureau products would not be achievable. Rather, emphasis must be placed on describing and quantifying the major quality features of the data.
Policy
- Statistics Canada will make available to users indicators of the quality of data it disseminates and descriptions of the underlying concepts and methodology.
- Statistical products will be accompanied by or make explicit reference to documentation on quality and methodology.
- Documentation on quality and methodology will conform to such standards and guidelines as shall from time to time be issued under this Policy.
- Exemption from the requirements of this policy may be sought in special circumstances.
- Sponsors of cost recovery surveys and statistical consultation work, for which no data will be disseminated by Statistics Canada, are to be made aware of and encouraged to conform to the applicable elements of the standards and guidelines issued under this policy.
Scope
This policy applies to all statistical data and analytical results disseminated by Statistics Canada however collected, derived or assembled, and irrespective of the medium of dissemination or the source of funding.
The standards and guidelines on the documentation of data quality and methodology
(Revised November 25, 2002)
A. Introduction
The Policy on Informing Users of Data Quality and Methodology requires that all statistical products include or refer to documentation on data quality and methodology. These standards and guidelines describe the kind of documentation that is expected. The Standards detail the mandatory requirements for documentation on data quality and methodology for all products under this Policy. For certain programs and their products, a broader and more detailed range of methodology and data quality documentation is desirable. The Guidelines outline the types of information to be included in such additional documentation.
The standards and guidelines are intended for use in planning or reviewing the documentation and dissemination for a statistical program. They should be taken into consideration in scheduling and budgeting program activities.
To set the stage for the standards and guidelines, Section B describes the elements of data quality as identified in Statistics Canada's framework on data quality, while Section C provides some basic definitions. The general principles that guide the implementation of the standards and guidelines are presented in Section D. The standards and guidelines themselves appear in Section E. An Appendix provides examples that illustrate their application.
B. The elements of quality
Among statistical agencies there is no commonly accepted definition of data quality for official statistics. Statistics Canada has defined data quality in terms of "fitness for use". Whether data and statistical information are fit for use depends on the intended uses and on intrinsic characteristics of the data or information. The essence of this Policy is that users must be provided with the information necessary to judge its fitness for their intended use.
Six dimensions of quality have been identified within the concept of "fitness for use".
- The relevance of statistical information reflects the degree to which it meets the real needs of users. It is concerned with whether the available information sheds light on the issues of most importance to users. The assessment of relevance needs to take into account the varying needs of users.
- The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure. It is usually characterized in terms of error in statistical estimates and is traditionally decomposed into bias (systematic error) and variance (random error) components. It may also be described in terms of the major sources of error that potentially cause inaccuracy (e.g., coverage, sampling, nonresponse, response).
- The timeliness of statistical information refers to the delay between the reference point (or the end of the reference period) to which the information pertains, and the date on which the information becomes available. It is typically involved in a tradeoff against accuracy. The timeliness of information will influence its relevance.
- The accessibility of statistical information refers to the ease with which it can be obtained by users. This includes the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which the information can be accessed. The cost of the information may also be an aspect of accessibility for some users.
- The interpretability of statistical information reflects the availability of the supplementary information and metadata necessary to interpret and utilize it appropriately. This information normally covers the underlying concepts, variables and classifications used, the methodology of collection, and indicators of the accuracy of the statistical information. This Policy aims to ensure the interpretability of our information.
- The coherence of statistical information reflects the degree to which it can be successfully brought together with other statistical information within a broad analytic framework and over time. The use of standard concepts, classifications and target populations promotes coherence, as does the use of common methodology across surveys. Coherence does not necessarily imply full numerical consistency.
Documentation on data quality and methodology is an integral component of statistical data and analytical results based on these data. Such documentation provides the means of assessing fitness for use and contributes directly to their interpretability.
C. Definitions
For purposes of these standards and guidelines the following definitions are used.
Data accuracy measure: a numeric value, or symbol corresponding to numeric values, which quantifies or summarizes the likely magnitude and important sources of differences between the published data and the quantities that the statistical activity was designed to estimate.
Data accuracy rating: a categorization or quantification of the accuracy of data based on expert judgement or analysis. It summarizes the accuracy of the data, or indicates the level of confidence with which the data may be used. Data accuracy ratings are appropriate when, by the nature of the data product, or for reasons of timeliness, cost or technical feasibility, data accuracy measures could not be given. Data accuracy ratings need to be based on sound evidence and good judgement. They may assess the effect of a single source of error, or the overall accuracy. They may be based on macro comparisons with data from other sources, or on conclusions drawn from a review of "data accuracy measures". They may be simply statements or numeric rankings based on an expert's assessment of data sources or of a methodology.
Indicators of data accuracy: data accuracy measures or data accuracy ratings. These may also be termed "accuracy indicators".
Documentation on methodology: the description of the underlying concepts and methodology used in the implementation of a statistical program, including detailed definitions of the variables, terminology, indices, models, and estimators used. It also includes descriptions of changes affecting comparability of data over time, or other features of methodology affecting data quality.
Statistical statement or analytical result: any statement or result that explicitly or implicitly indicates the underlying meaning or statistical significance of an estimate or finding. These include highlights, interpretations, statistical test results, and statements of trend, change or significance.
D. General principles
The following general principles should govern the implementation of the standards and guidelines.
- Users must be provided with the information necessary to understand both the strengths and limitations of the data being disseminated.
- The documentation provided to users on data quality should engender an awareness of quality as an issue in the proper use of the data.
- The documentation on methodology must permit users to assess whether the data adequately approximate what they wish to measure, and whether the estimates were produced with tolerances acceptable for their intended purpose.
- The documentation provided should be clear, well organized and accessible. Accuracy indicators should not be technically difficult for the intended clientele to understand or use.
- The descriptions of methodology and the indicators of data accuracy should be carefully integrated whenever this will benefit the user's understanding.
- Specific standards for the level of detail to be provided in documentation on data quality or methodology are given in Section E. These are mandatory but minimum requirements. The need to go beyond these standards will depend on the benefit to users or more specifically on:
- the type of data collection, data sources, and analysis;
- the nature and purpose of the product;
- the range and impact of uses of the data;
- the medium of dissemination; and
- the total budget of the statistical program.
- The detail and frequency of the updating of the documentation on data quality for the purposes of the Policy, should consider:
- the intended uses of the data;
- the potential for error and its significance to the use of the data;
- variation in accuracy and coherence over time;
- cost of the evaluation of data quality relative to the overall cost of the statistical program;
- potential for subsequent improvement of quality and efficiency;
- applicability and utility of the indicators of accuracy to users.
E. The standards and guidelines
The nature and complexity of the information on data quality and methodology that should be provided for data users will depend on the statistical program and the nature of the data in the product. The medium of dissemination will have some bearing on how and how much documentation on data quality and methodology can be presented or readily accessed. Despite differences in these regards there is a minimum set of documentation on data quality and methodology to which all users of any data must be able to refer. The standards (E.1) detail these minimum mandatory requirements under the Policy. The subsequent section, The guidelines (E.2), describes additional documentation on methodology and data accuracy that should be provided for major surveys, censuses or programs. Options or additional requirements for specific types of data are described in section E.3.
For all programs and products the Integrated Metadata Base (IMDB) will play an important role in satisfying the requirements of the Policy. As the repository of information about Statistics Canada's surveys and programs, the IMDB is expected to contain most of the information on methodology that might be provided to users. The IMDB will also contain the information on data accuracy that is mandatory under the Policy. This information can be copied from the IMDB to the product or, for some media, accessed through an electronic link to the IMDB. The headings under which the mandatory requirements are to be covered are aligned with the headings on the IMDB in the expectation that the same text will serve both purposes for most statistical products. A consistent set of headings will also better serve users who are accessing a range of different products.
The Policy specifically requires that statistical products include or make reference to the documentation on data quality and methodology. The mandatory information is expected either to be part of the statistical product, to be embedded through electronic links, or to otherwise accompany the product. In all cases it should appear as a contiguous whole and not be scattered through the product. If necessary, specific elements of this documentation may be included or referenced elsewhere in the product, but duplication should be minimized.
All references or links directing users to information on data quality and methodology should include the words "quality" (in most cases this should be "data quality") and "methodology".
E.1. The standards: Mandatory documentation
A specific set of summary information on data quality and methodology must be presented or made available to users for each statistical product. The information should reflect the individual product, but it is anticipated that much of this summary documentation will be common to many products from the same statistical program.
Topics to be included in documentation
The summary documentation required by the Policy is to be organized according to the structure below. The bullets under each heading indicate the information that should be included (wherever applicable) in summary documentation. They are not intended to exclude any other information necessary for the proper interpretation or use of a particular information product, nor to disallow variation in the placement of material, among these heading, in the interests of clarity to the user. The exact content under each heading will depend on the individual program, on the types of data or results included in the product, and on whether there are important accuracy issues to describe. The numbering system used below is not part of the standard; it is provided here only for clarity.
1. Note(s) to users (if applicable)
(Explanatory note: This item is to be included only if applicable. This topic may consist of highlights of information provided in one or more of the sections listed below, or particular explanations or warnings of which users should be aware.)
2. Name of survey or program- concepts, methodology and data quality
a standardized message introducing the information on data quality and methodology and emphasising the importance of taking it into account. A basic standardized text is provided in the examples in the Appendix to these Standards and Guidelines, though some variation from this text may be necessary.
2.1 Data sources and methodology
- an introductory paragraph (e.g., purpose, objectives and general nature, subject matter or content of the survey);
- a description of the survey or program population, and of any differences between the population surveyed (or described) and the conceptual universe or the target population (e.g., differences due to population or frame exclusions or limitations; differences with what is "commonly" or ideally to be measured, described, or analysed, or with what is generally understood by the subject matter community; differences between what ideally should be measured and what can be measured);
- a statement on the time frame or reference period of the data;
- general methodology ;
- a statement on the data source(s) and the sampling and collection methodology;
- a statement on the processing and estimation methodology;
- revisions and adjustments (if applicable);
- a statement advising what data are subject to revision and why, and an indication of what the size of the revision might be - for example, a measure based on past revisions;
- a description of benchmarking, calendarization or seasonal adjustments made to the data and their impact.
2.2 Concepts and variables measured
- key concepts, variables (or characteristics) and classifications used;
- key indicators, indices, or other key data or analytical results being disseminated.
2.3 Data accuracy
- a statement of the key data accuracy issues, as well as an acknowledgement that the data are subject to error, and that the level of error may vary across geography and by characteristic (as applicable, such statements may emphasize the presence of coverage error, sampling error, error due to non-response, response error, and processing error, and may be incorporated in text with measures of accuracy);
- for census, survey or administrative data, a data accuracy measure of coverage, or a coverage rating (see Section C), and an assessment or commentary on any accuracy issues related to coverage error;
- for sample survey data (or data from the sample component), estimates of sampling error for key characteristics, and a brief summary of accuracy issues and adjustments related to the sample design and estimation;
- for census, survey or administrative data, a response rate (see the Standards and guidelines for reporting of nonresponse rates), a statement on how non-response and response error are handled, an imputation rate or other measure of the extent of imputation and its contribution to the estimates, and an assessment or commentary on any significant accuracy issues related to non-response, response error or imputation;
- if applicable, descriptions and accuracy indicators for important residual errors identified (e.g., response or measurement errors that could not be effectively dealt with by imputation);
- if applicable, a statement advising that the data are not or may not be comparable over time and why (including any significant change in data accuracy from one reference period to another);
- if applicable, an explanation of the similarities and differences between related data sources (e.g., usually other STC programs) and the results of data "confrontation", or of comparisons with other sources or a data series;
- for analytical results, a summary of the analytic methods, assumptions and caveats, as well as a brief description and discussion of the possible effects of data accuracy, the survey concepts and the analytical assumptions on the results - especially on the validity or the statistical significance of these results - (see also E.3.5 below);
- a description of any other important issues or events (e.g., a strike) influencing the accuracy, interpretation or use of the data.
3. Appendices (as necessary) and/or references or links
If applicable, appendices may be added. References or links to additional or related information on data accuracy or methodology should be added or embedded. These references or links may be added at the end or throughout the above documentation.
E.2. The guidelines: Additional documentation
For major surveys and statistical programs there is good reason to provide users with more detailed or more specialized data quality and methodology information than that required by the Standards of the previous section. The supplementary documentation might cover topics specified in Section E.1 in more detail, or might address topics not covered by the summary documentation.
The supplementary documentation might include "technical" documentation to afford analysts a greater understanding of accuracy issues and a fuller appreciation of the methodology. Such documentation may take many forms, from a comprehensive report, to separate reports, appendices or chapters on specific aspects of methodology or data quality evaluation.
Potential topics or documentation to include in this supplementary documentation are:
- topics covered in the standards (Section E.1);
- historical quality trend or record - for any category or indicator of accuracy the long term record or trend;
- the questionnaire(s) used;
- the sampling frame - creation, updating, and quality assurance;
- the detailed sample design and estimation procedures;
- other processing - description of methods and indicators of the extent of coding errors, data capture errors, impact of edits, etc.;
- a description of the imputation approach and examples of key imputation rules;
- quality control procedures used;
- the form in which the final data are stored and the tabulation or retrieval system, including confidentiality protection requirements and procedures;
- any special procedures or other steps that might be relevant to the particular content of the product;
- total variance (or total standard error) or its components by source - the overall variability of the statistics, including the effect of sampling error, response error, and processing error;
- non-response bias - an assessment of the effect of non-response on the results;
- response bias - evidence of response bias problems stemming from respondent misunderstanding, questionnaire problems, or other sources;
- seasonal adjustment - description of the methodology and measures of the impact and significance of the adjustment together with an explanation of how these measures should be interpreted (for example, the mean absolute percent change of the last year's revisions of the seasonal factor, or the MCD - months for cyclical dominance - statistics).
- data quality validation and evaluation - results and descriptions of the methodologies of the studies, processes and methods used to assess, measure or evaluate the accuracy of the data.
Statistics Canada's Quality guidelines is a useful source of information to help identify what may be the important quality issues to be considered for inclusion in the supplementary documentation, as well as the potentially significant sources of error that might be examined in greater depth.
Electronic products for which additional documentation on data quality and methodology exists, will normally have links to the additional documentation either embedded into the product or accompanying the product. Other products will contain an explicit reference to such additional documentation whenever it exists.
E.3. Special requirements based on the type of data
The standards of section E.1 apply to all forms of disseminated statistical data and analytical results. However, there are special requirements pertinent to specific types of products. As a supplement to the general standards, the following items should be included, as applicable, in documentation for the corresponding types of products.
- For index numbers of prices or quantities, the conceptual basis presents an additional dimension in describing the data quality and methodology.
Particular attention might be given to any substitutions made in developing the estimates, with special reference to product changes and changes in product quality.
In addition, particular attention should be paid to specific conceptual and methodological aspects of the indices. Their proper description, in many cases, may be more important for users than a strict assessment of the quality of input data. The following elements should be developed:- definitions - precise definitions of the underlying economic concepts that the index numbers are intended to measure. Reference should be made to any application or class of application (e.g., deflation of macro-economic aggregates) for which the index numbers are not suitable.
- the methodology adopted - documentation should cover topics such as the index formula, weighting system, computation of the index at various aggregation levels, basing, re-basing, linking of indices, treatment of changes in the varieties or qualities of goods available on the market. The adopted methodology should be compared with the underlying index concepts and possible distortions discussed.
- In the case of National Accounts and data resulting from other data integration activities, both the impact of quality problems in the source data, and the impact of the methods of analysis, integration, benchmarking and adjustments used, have to be taken into account. Given the multiplicity of data sources and the complexity of methods, it may be necessary to use data accuracy ratings. In particular, it may be necessary and desirable to consolidate the ratings for all major and assessable components or sources of error into a single set of data accuracy ratings.
Documentation for data and analytical results based on data integration activities (including the System of National Accounts) should, in particular, cover the following topics:- the conceptual framework for the analysis and integration;
- the major definitions and concepts used and how they are defined operationally;
- the data sources used, and the extent to which they measure the target concepts, as well as gaps and deficiencies in these data sources. Non-comparability of data elements available from different sources should be noted. Reference should be made to the quality of the primary data underlying the analysis;
- the methods used in integrating and analysing the data from feeder sources including, where relevant, the adjustments made to data from different sources, the methods used for price deflation, the methods used for seasonal adjustment and benchmarking, and a description of the revision process; and
- any discrepancy arising in the integration or analysis of data from different sources, and the procedures by which these discrepancies were handled (e.g., the statistical discrepancy arising in the estimation of income and expenditure accounts).
- For statistics derived from administrative data or from data not collected by Statistics Canada, the topics listed in E.1 should be covered to the extent applicable. However, since these statistics may be based on data not originally collected for statistical purposes, the following topics take on particular importance and should be covered:
- the data sources;
- the purposes for which the data were originally collected;
- the merits and shortcomings of the data for the statistical purpose for which they are being used (e.g., in terms of conceptual and coverage biases)
- how the data are processed after being received and what, if anything, is done to correct problems in the original data set; and
- the reliability of the estimates, including caveats where necessary.
- Documentation for geographic and cartographic data products should include descriptions of the data sources and transformations, along with descriptions or references to the methodology and indicators of data accuracy corresponding to these sources. Documentation should also include descriptions or indicators of the positional accuracy, logical consistency and completeness of the product data.
- For products that include primarily or only analytical results, documentation should be provided on both the source data and the method of analysis. The requirements for documentation on the source data are similar to those for other products and can be met by including, linking to, or referring to the corresponding information for the data source(s). The documentation of the methods of analysis may be incorporated into the product either as part of the presentation of the analytical results in the body of the report, or in separate "text boxes". Such "text boxes" might also include summary information on the data source (in addition to links or references to the source documentation). Documentation on the analysis should also note the use of the Policy on review of information products as a quality assurance methodology.
For products that consist of a series of analytic reports in the same broad subject area, it may be possible to present or embed the mandatory information common to all or most of the individual reports at the beginning of the product. Information specific to the individual reports would then be included in those individual reports.
Specifically, for products that present analytical results, the documentation of the analysis should cover the following:- data source(s) used;
- key features of the methodology and accuracy of the source data pertinent to the analysis;
- analytical objectives, concepts and variables;
- analytic methods used, and their assumptions and caveats;
- statistical significance of the results, and any relevant conflicting or corroborating results; and
- appropriate use of the results.
- For building-block data products (i.e., products consisting of microdata or low level aggregations of data intended to be used for aggregation or analysis rather than for direct use) documentation needed to facilitate appropriate use of the product should be provided. For example documentation should include an explanation of the proper use of weights and of the potential accuracy consequences of doing otherwise. This should include an explanation of the use of weights according to the nature of the estimation or analysis (e.g., totals, rates/percentages, regression, variance or coefficient of variation) and of the related requirements of client software.
- For products from a longitudinal survey or a supplementary/secondary (or similar) survey, data accuracy and methodology issues specific to this type of survey should be assessed, described and commented on accordingly. Accuracy indicators should include both cross-sectional and cumulative indicators, where applicable. For example, the response rates for the current cycle of a longitudinal survey and cumulative rates for the longitudinal sample should be defined, and presented (see the Standards and guidelines for reporting of nonresponse rates).
Appendix: Examples of mandatory summary documentation in standardized form
About the examples
The fictitious examples presented in this appendix are intended to be illustrative of the application of the summary documentation requirements presented in section E of the Standards and Guidelines. The examples do not explicitly demonstrate the application of the standards for all types of statistical programs, methodologies, and quality outcomes and issues. However, it is expected that the examples will be useful as illustrations of the type of information and the level of detail required.
First example: The first example is of a relatively straightforward survey for which the quality objectives have been met.
Second example: The second example is of a statistical program for which there are significant errors in some key data and ongoing quality issues. Due to the nature of the program and the unmet quality objectives, there is a need for more discussion of concepts, methodology, data accuracy, and matters of comparability with related data sources.
Third example: The third example is of a sample survey. The major quality objectives have been met. Due to the nature of the survey, there is a need for discussion of the sample design, of sampling error and of comparability with related data sources.
Limitations of the examples: The examples are intended to illustrate documentation requirements and nothing further. They are otherwise fictitious. However, given the broad range of subject matters, methodologies and outputs of Statistics Canada, it is unavoidable that there are parallels with existing programs or areas of endeavour. Nothing is intended or should be inferred beyond the purpose of illustration.
Example 1: A fictitious report for purposes of illustration only
Survey of copper ingots
Concepts, methodology and data quality
The following information should be used to ensure a clear understanding of the basic concepts that define the data provided in this product, of the underlying methodology of the survey, and of key aspects of the data quality. This information will provide you with a better understanding of the strengths and limitations of the data, and of how they can be effectively used and analysed. The information may be of particular importance to you when making comparisons with data from other surveys or sources of information, and in drawing conclusions regarding change over time.
Data sources and methodology
The Survey of copper ingots measures, on a quarterly basis, the production and shipment of copper ingots. The target population is all producers in Canada of ingots of refined copper, regardless of the primary activity of the company or establishment (footnote, reference or link to reference). Information is provided by all identified producers.
Reference period
The information contained in this data product reflects production completed and shipments made in the period January 1, 2000 to March 31, 2000.
General methodology
Data are submitted electronically by producers, in a common format and consistent with industry accounting practices, within 30 days of the close of the quarter. Received data are subject to editing for errors and inconsistency, and in turn to follow-up with respondents. Follow-up is also carried out for missing data.
Revisions
Data are subject to revision in the event of late receipt of initial or revised information from respondent organizations, and if new producers are identified. Revisions occur only rarely and are disseminated in the subsequent quarter.
For data confidentiality reasons, only national estimates are disseminated.
Concepts and variables measured
The statistical data presented in this product refers to commodity XYZW.PR.GH as per the Standard Classification of Goods (SCG) (footnote or reference).
Production quantities (in metric tonnes): refer to production completed in the reference period, regardless of when started. All re-refined or recycled copper is excluded as are all ingots purchased or transferred to, but not refined by the producer.
Shipment quantities (in metric tonnes): refer to shipments out of plant made in the reference period, regardless of whether received at destination. All re-refined or recycled copper is excluded as are all ingots not refined by the producer.
Imported and exported copper: Production quantities of imported copper and of domestic copper refer to ingots refined, respectively, from foreign and domestic ore bodies.
Value of shipments: the gross revenue from sales, plus the "current market value" of copper (based on average gross revenues from sales), if transferred or consumed within the company or enterprise, without sale, or otherwise not directly sold in the form of ingots.
Data accuracy
The methodology of this survey has been designed to control errors and to reduce the potential effects of these. However, the results of the survey remain subject to error - e.g., coverage, response and processing error, and errors as a result of non-response.
The target population is identified from Statistics Canada's Central Frame Data Base (CFDB) (reference, footnote or link to reference regarding the CFDB methodology and quality) This business register is kept up to date using administrative information on businesses received monthly from Canada Customs and Revenue Agency, as well as information from other Statistics Canada surveys and business profiling activities. Any existing companies commencing or ceasing production will likely be identified through feedback within one quarter of this change. In the past five years this has occurred on six occasions. In all cases data were revised and disseminated in the subsequent quarter. In the unlikely event the update comes more than one quarter after production has commenced or ceased data are revised only for the previous quarter. This has happened once in the past five years, and this prior to the introduction of the more rigorous CFDB system. The resultant error amounted to less than 1% of production quantities for a period of four months, thus is not considered to significantly affect comparisons over time. The CFDB is considered to be 100% complete for this element of industry (footnote, reference or link to reference regarding industry classification) - and supported by feedback from producers.
With respondent co-operation, follow-up, and editing and imputation procedures, the level of response and processing error, and the effects of non-response are controlled. All members of the identified target population routinely provide information. Ninety-eight percent of that information is complete and consistent after follow-up. Adjustments using past information and trend data from other respondents of similar size are made for the residual errors. These "adjusted" data represent less than 0.25% of final quantities and dollar values.
For respondents producing ingots from mixed sources, the respondent is asked to estimate the amounts based on relative assayed values (as applicable) and volumes. Information from the Annual Survey of Manufactures indicates that the accuracy of this estimation varies greatly but involves few producers and has negligible impact on the statistical data.
Research has indicated that the estimation of inventories from these data will provide viable analytical results. These results also support the accuracy of reporting. Users should note, however, that by design this survey does not include re-refined or recycled copper. The total production and stock of refined copper cannot be estimated solely from these data.
As these data are collected from all producers, they are not subject to sampling error, as is sometimes the case for monthly and quarterly manufacturing and industry surveys.
Comparability of data and related sources
These data conform to the definitions of the ASM and the Monthly Survey of Manufacturing. As the source frames of the target population, improvements made to these surveys have benefited the Survey of Copper Ingots; largely in terms of timeliness and completeness of coverage. Improvements in coverage may have some impact on the comparability of results from this survey when making comparisons with results prior to 1994.
Some related sources of statistical data and products:
For additional information on data quality and methodology please contact (footnote, reference or link to reference).
Example 2: A fictitious report for purposes of illustration only
Survey of health care costs
Notes to users:
- Health care costs and component indices: December 1999 cost totals, indices and component indices for surgical medical services, at the national level and for all geographic levels within Nova Scotia, have been revised to account for an error in the reporting and processing of some data. The changes reflected by the revisions at the national level are small. The corresponding rate of revision to the data for Nova Scotia varies; for most less than 1%, but in some geographic areas as high as 5% for "Other Medical Services". The revised totals and indices for December are presented.
The cause of the error was also present for the January preliminary data, but not beyond. However, the effect of the error on the indices has extended through the first three months of this year, ending with the preliminary results for March, 2000. Corrections to these results will be reflected in the final results for these three months, starting with the final January results, presented herein. The April preliminary results and the 1999 annual results, also presented herein, are unaffected.
The errors had little impact on the preliminary results for February and March at the national level. However, the results for Nova Scotia should be used with some caution. For these months the preliminary data on Other Medical Services for Nova Scotia should not be used. - Increase in health care costs: For 1999, changes in the schedules for fees of medical services under provincial/territorial Health Acts have accounted for much of the increase in overall annual expenditures and, in particular, under various "non-elective surgery" components.
Concepts, methodology and data quality
The following information should be used to ensure a clear understanding of the basic concepts that define the data provided in this product, of the underlying methodology of the survey, and of key aspects of the data quality. This information will provide you with a better understanding of the strengths and limitations of the data, and of how they can be effectively used and analysed. The information may be of particular importance to you when making comparisons with data from other surveys or sources of information, and in drawing conclusions regarding change over time, differences between geographic areas and differences among sub-groups of the target population.
Data sources and methodology
The Survey of health care costs (SHCC) measures monthly, health care expenditures under the Canadian health care system. The data provided are for in- and out-patient hospital services, and for insured medical services. The survey results include a variety of cost indices for the provinces and territories, and population age-sex groups which track cost changes over time.
The target population is the set of all expenditures by all health care providers delivering insured medical services, directly or indirectly, to individuals in Canada under a provincial or territorial government health care program. Excluded are expenditures for services provided by the facilities of Veterans Affairs, National Defence and the Solicitor General of Canada, as well as expenditures for medical services provided to members of the Canadian Forces and the RCMP, and to Federal prisoners. The data on health care costs are obtained from administrative data provided by these government health care programs, through the Canadian Institute for Health Information (CIHI) (footnote, reference or link to reference).
Reference period
The reference period for the data collection and compilation is the calendar month. Preliminary results are released approximately 150 days (5 months) from the end of the reference month. Final data are released three months later. Annual data for the preceding calendar year are released the following October.
General methodology
On a monthly basis aggregate data on government payments to hospitals and fee-for-service claims are provided by the provincial and territorial governments under the auspices of the National Health Statistics Council. These data are extracted and compiled, by agreement, from the respective health information data bases maintained by the provinces and territories. The data are received in electronic medium, in a pre-specified format (detailed data requirements are given in Appendix 1 (or link to reference)). Received data are edited for consistency and completeness. Potential errors and inconsistencies related to hospital payments and to fee-for-service payments (if critical) are resolved through follow-up with the respective provincial or territorial agency. Residual errors or inconsistencies related to fee-for-service payments are resolved by statistical adjustments using historical or administrative data. Statistical adjustments are then made to ensure hospital payment data reflect monthly payment periods.
Revisions and seasonal adjustment
Data for the most recent month are preliminary. These data are subject to updates and corrections from source. The final data for the month reflect such changes and are disseminated in the subsequent quarter along with the preliminary results for that quarter. The overall monthly rate of revision (preliminary to final) has been approximately +5% over the past three years. Higher rates sometimes occur at the province/territory level and for lower levels of aggregation.
Annual data will differ from a simple compilation and application of the data for the associated 12 months. These data are subject to adjustments from source, including the application of negotiated budgetary limitations on fee-for-service payments and supplementary payments to hospitals. The aggregate revision may occasionally exceed ±5% nationally and are usually in the range of -10% to +15% for the provinces/territories. (See Appendix 1 for details (or link to reference).)
Regular (annual) seasonal events and cycles (e.g., seasonal changes in weather and statutory holidays) cause predictable fluctuations in the data. The data series disseminated includes the seasonally adjusted data (i.e., excludes predictable annual influences) and the unadjusted data.
Concepts and variables measured
Characteristics: The major concepts and variables measured are Costs and Medical Services.
Costs collected or derived are:
- Hospital services costs - payments made by the provincial/territorial government to hospitals to provide medically necessary in- and out-patient services, consistent with the requirements of the Canada Health Act, and covering operating and capital costs (excludes donations and capital grants) to provide these services
- Fee-for-service costs - payments to physicians on a fee-for-service basis, or the equivalent, made for insured medical services, including such fees for medical services provided in-hospital (excludes payments under supplementary coverage and fees paid by private health care plans)
Out of province/territory and out of Canada payments for insured medical services are presented separately from the above costs. All costs are presented in dollar values rounded to the nearest thousand.
Medical services are categorized according to Surgical Classification (elective and non-elective), Diagnostic Service and Other Medical Services. These include only insured services based on the requirements of the Canada Health Act. Supplementary services and services under private health care plans are not included.
In addition, cost and medical service information is provided by age and sex of patients, and geographic areas (Canada, Province/Territory, Health Region and Health District, Census Metropolitan Area and Urban Size Group/Rural Area) based on site of service delivery (derivation is outlined in Appendix 1 (or link to reference)).
More detailed classifications under Medical Services, lists of the geographic classifications used and additional explanatory documentation are provided in Appendix 1 (or link to reference).
Indices: In addition to totals, costs indices are presented for monthly and annual Fee-for-Service costs and for annual Hospital Services costs. These indices, referred to as the Health Care Component Indices (HCCI) are of the following basic form:
HCC Index =
Current Total Cost for Category X 100
Reference Base Total Cost for Category
The reference base for all indices is the corresponding 1991 cost for the category and reporting period (month or year). For major geographic levels, separate indices are also reported using Adjusted Current Cost which is equal to the Current Cost adjusted to constant 1991 dollars. The indices and associated methodology are explained in more detail in Appendix 1 (or link to reference).
Data accuracy
The methodology of this survey has been designed to control errors and to reduce the potential effects of these. However, the results of the survey remain subject to error - e.g., coverage, measurement and processing error.
For the survey, the accuracy or completeness of coverage of the population of interest is a question of the timeliness of updates to the provincial and territorial data bases. Some degree of lag does occur, almost exclusively affecting preliminary data, and rarely the final data. The size of this error in preliminary data is reflected in the rate of revision to monthly data. Revisions made as part of the derivation of annual data are not usually due to matters of coverage, but to end of year budgetary or contractual adjustments.
The survey does rely on the cooperation of the provinces and territories, and on the accuracy of their data. These jurisdictions have a consistent record of providing data for all of the components of the survey; nonresponse is not an issue. The statistical data will include the effects of any coding, reporting and processing errors that cannot be detected and are not corrected at source. Data misclassified at source can also lead to coverage errors; that is if these errors result in misclassification of medical services as in or out of scope for the survey. Errors corrected at source are reflected in the revisions. In addition to monthly checks and comparisons, survey procedures include year end comparisons and adjustments to ensure the survey data are consistent in aggregate with that of provincial/territorial sources.
On December 1, 1999 a revised structure for the classification of medical services for Nova Scotia and, in turn, a revised extraction system to provide the aggregate data for this survey were implemented for this province. Perhaps due to the timing of this change close to the holiday period, the changes in the classification were not consistently implemented. Although additional data were available for the province to make payments, this information was not utilized in the data extraction system. As a result the extraction system transformed many of the erroneous classifications from insured medical services to services covered only by private health care plans. This, in effect, created a coverage problem for the survey. The problem was identified in the year end consistency checks. The impact of these errors is outlined in the Notes to Users above. (Also note, the increased rate of revision for affected months is, or will be a direct result of these errors.)
Modifications to the system and procedures have been made to reduce the possibility of similar types of problems and outcomes in the future.
Comparability of data and related sources
These data are consistent with the general structure of the regulations for medical coverage and fee structure under provincial Health Acts (footnote, reference or link to reference), although as broad aggregations across classifications within these regulations. The data similarly conform to the relevant diagnostic and surgical classifications of the Hospital Utilization Survey (HUS) (footnote, reference or link to reference) and the Survey of Emergency Room Utilization (SERU) (footnote, reference or link to reference). The data are not readily comparable to surveys that are based on data provided by households or individuals (e.g., the National Population Health Survey (footnote, reference or link to reference).) The cost information from the HUS and the SERU surveys can be integrated with that of the SHCC to provide a more complete analysis of cost breakdowns. However, since target populations and cost inputs differ this must be done with full knowledge of the appropriate methods. This type of analysis is published annually in .... (reference, or reference and link or footnote), along with analysis based on the SHCC cost indices.
The indices presented are not typical of or comparable to prices indices produced by Statistics Canada (e.g, the Consumer Price Index, the Farm Product Price Index, the Industrial Product Price Index (footnotes, references or link to references)). The HCCI are based on costs for all in scope items, rather than on a sample of costs for a "basket" of services (in this case). Thus changes in the mix and frequency of use of medical and hospital services can more readily bring about a change in an index, rather than changes due to changes in cost or price of services or in hospital budgets (which are usually set annually). The HCCI, therefore, are not direct indicators of price change. They are not direct indications of inflation in health care costs, or of changes in the need for health care services. They are one of many information components used to assess the health status of Canadians and economic issues related to the health care system.
The errors in the December 1999 data for Nova Scotia have not affected the comparability of final data as corrections have been applied.
Changes under the regulations for medical coverage and fee structures under provincial/territorial Health Acts have accounted for much of the increase in the annual (1999 versus 1998) expenditures (dollar value and indices) overall and, in particular, under various of the "non-elective surgery" components. Little of the change is due to a change in the mix and volume of services provided, as indeed the volume as decreased.
Some related sources of statistical data and products:
For additional information on data quality and methodology please refer to Appendix 1: Supplementary information on survey methodology and data quality (or link to reference).
Example 3: A fictitious report for purposes of illustration only
Monthly survey of employment insurance recipients and transitions
Concepts, methodology and data quality
The following information should be used to ensure a clear understanding of the basic concepts that define the data provided in this product, of the underlying methodology of the survey, and of key aspects of the data quality. This information will provide you with a better understanding of the strengths and limitations of the data, and of how they can be effectively used and analysed. The information may be of particular importance to you when making comparisons with data from other surveys or sources of information, and in drawing conclusions regarding change over time, differences between geographic areas and differences among sub-groups of the target population.
Data sources and methodology
The Survey of employment insurance recipients and transition (SEIRT) measures, on a monthly basis, labour market related characteristics, skills, preferences and plans of persons receiving Employment Insurance (EI) benefits. These data - as one component of information on the characteristics and needs of the experienced labour market supply - are used as input for business and industry planning and development, and for development of government policies, training strategies and programs.
The target population is all persons who were recipients of Employment Insurance (or approved for payments) during the survey reference month.
Reference period
The reference period for data collection purposes is the calendar month. Data are collected 30 to 40 days after the end of the reference month. Results are released approximately 90 days after the end of the reference month. Annual data for the preceding calendar year are released with the data for the March survey month.
General methodology
The data are collected through a telephone survey of a sample of applicants approved by Human Resources Development Canada to receive EI benefits. (Respondents were informed at the time of application for benefits that they might be selected for the sample.) A random sample is selected by Statistics Canada based on the date (month) of approval for issue of the first EI payment. The sample is stratified to provide estimates for Census Metropolitan Areas, for urban and rural areas, and by province or region. The approved maximum period of eligibility is also considered in order to maintain a more or less uniform sample size and workload.
A selected person is in the sample for 12 consecutive survey months or until the end of the survey month for which benefits end, whichever comes first. Each month the sample is supplemented by a sample of persons approved, during that month, for issue of payment. The total sample size for a month is approximately 30,000 persons. Data collection is computer assisted with basic edits performed to ensure validity, consistency and completeness. Data are corrected, if possible, with the assistance of the respondent. Residual errors, missing data or inconsistencies are resolved by statistical adjustments using historical or administrative data, or by imputing (substituting) consistent data from respondents with similar characteristics. Coding of occupation to numeric classifications is completed using a combination of automated and manual procedures.
Final results are weighted to represent the total target population. For further information on the weighting and estimation methods, as well as other aspects of the methodology of the survey see ... (footnote, reference or link to reference).
Revisions and seasonal adjustment
Data for the most recent month ended are considered final. Annual data may differ from a simple compilation of the data for the associated 12 months, as a result of statistical adjustments and use of administrative data to improve the data quality of the annual results.
Regular (annual) seasonal events and cycles (e.g., seasonal changes and statutory holidays) cause predictable fluctuations in the data. The data series disseminated includes both seasonally adjusted data (i.e., excludes predictable annual influences) and the unadjusted data.
Concepts and variables measured
Characteristics collected
The survey collects the following characteristics or data for the selected EI recipient:
- demographic characteristics (age, sex and marital status)
- the number of dependent children (in the household)
- most recent occupation
- type of EI benefits
- level of formal education and certification, and year(s) received
- approved and actual period of eligibility (administrative source)
- if now ineligible for EI, with reason
- number of times an EI recipient in the last 5 years (administrative source)
- work and re-location preferences and expectations
- training plans or interests
- year and nature of any job related training
- access to the Internet
- availability of a personal computer
- technology and other key skills
The survey also collects data on the basic demographic characteristics (age and sex) of each of the other household members, total household income and dwelling tenure (whether owned, leased or rented by the selected person or spouse).
Characteristics derived
Certain characteristics are derived from the data collected: a) month-to-month change in selected person's expectations, preferences (work or re-location), training plans, or skills through training, and b) outcome of EI period or transition (e.g., now on full-time training, or now employed full-time).
Definitions or variable breakdowns for these characteristics, and additional explanatory documentation are provided in Appendix 1 (or link to reference).
Data accuracy
The methodology of this survey has been designed to control errors and to reduce the potential effects of these. However, the results of the survey remain subject to error due to sampling, as well as to non-sampling error - e.g., coverage, response and processing error, and errors as a result of non-response.
Sampling error
As the data are based on a sample of persons they are subject to sampling error. That is, estimates based on a sample will vary from sample to sample, and typically they will be different from what the results would have been based on a complete census. The potential range of this difference has been be estimated for key data. Results are provided and discussed in Appendix 1 (or link to reference).
Non-sampling error
Coverage of the targeted population is close to 100% complete; i.e., the list from which of the sample is selected is almost always complete. Comparisons with annual information from Human Resources Development Canada confirm this.
The only significant response error identified is for household income which is under-reported (see Comparability below). As for nonresponse, approximately 10% of those initially selected choose not to participate. Over the period of inclusion in the sample there is further attrition. Of those who initially chose to participate more than 80% will respond through their full cycle. Over the past three years the monthly average response rate, among those who initially chose to participate has been 90%. The response rate does vary over time, from province to province, and according to age and other characteristics of the selected person. Adjustments (imputation and weighting) are made to compensate for nonresponse and non-participation. These ensure that the survey population totals agree with known totals for the provinces, territories and larger CMAs. However, to the degree that nonrespondents (or non-participants) and respondents differ with respect to the characteristic of interest, there may be residual effects on the accuracy of estimates. (More details are provided in Appendix 1 (or link to reference).)
Comparability of data and related sources
The variables and characteristics: Wherever possible the concepts used in SEIRT are standard definitions (footnote, reference or link to reference) or in common with definitions used for other statistical programs of Statistics Canada - e.g., for demographic data, geographic variables, and occupation. Where direct comparisons with other surveys are possible for this target population, except for income no data accuracy issues arise. Comparisons indicate that income is under-reported in the SEIRT particularly for households with three or more persons of age greater than 18. A closer examination of comparability is considered in Appendix 1 (or link to reference)
The target population: The target population of SEIRT is not generally comparable to those of other data sources:
- The SEIRT represents all persons who are recipients of EI benefits for at least part of the month. It does not represent a specific reference week within the month, as in the case of the Employment Insurance Statistics program (footnote, reference or link to reference).
- It does not represent all persons in the labour force who are unemployed and does include persons receiving special benefits (e.g., parental benefits) not counted as unemployed (footnote, reference or link to reference).
- Unlike the Employment Insurance Coverage Survey (footnote, reference or link to reference), SEIRT does not include persons eligible for EI who do not make a claim, and it does not include unemployed persons who are not eligible for EI (footnote, reference or link to reference).
- For reasons similar to those for the Employment Insurance Coverage Survey, SEIRT estimates do not reflect the same target population as that of the Survey of Changes in Employment (footnote, reference or link to reference).
- Unlike the monthly SEIRT estimates, the annual estimates over-represent the number of persons who received EI benefits. The annual data count each period of benefit; e.g., if a person is approved for benefits on two distinct occasions he or she contributes twice to the annual estimates.
A useful comparison of the data for the populations of these various sources can be found in the analytical report ... (name and reference or link to reference).
Comparisons over time: The SEIRT estimates are not comparable to estimates for past reference periods with different EI parameters, rules and regulations. On this basis current estimates are not comparable to those with reference periods prior to January 1, 1997 (footnote, reference or link to reference).
Some related sources of statistical data and products:
For additional information on data quality and methodology please refer to Appendix 1: Supplementary information on survey methodology and data quality (or link to reference).