This document will use two existing models to define the activities included in the scope of this generic PIA.
First, the Generic Statistical Business Process Model (GSBPM version 5.0, December 2013), developed by the United Nations Economic Commission for Europe (UNECE) is used to identify privacy risks by activity within a statistical program. Second, the Statistics Canada Information Framework is used to identify the portion of Statistics Canada's overall information holdings that are covered by this generic PIA. The Information Framework is included in Section 10 of Statistics Canada's Strategy for Information Management (October 26, 2010).
The GSBPM is used to identify those activities that involve the collection, use, maintenance and dissemination of personal information (as defined by the Privacy Act). The current model is comprised of eight phases and several sub-processes within each phase. As indicated below, personal information may be used in five of these eight phases. For more details on the activities included in each phase and sub-process of the GSBPM, refer to the UNECE document, available on their web site. A graphical representation of the model is included in Appendix 5 of this document. Also included in this appendix is a graphical representation of the link between the GSBPM and Statistics Canada's data-related policies and directives.
It should be noted that not every sub-process is included in any particular statistical program.
1.6.1 GSBPM 5.0
In this section, each of the phases and sub-processes of the framework are presented. The text to describe each phase is taken directly from the UNECE document. Following the description, the type of personal information, if any, involved in the sub-process is described. Reference to the Threat and Risk Assessment Grids provides a link to the associated level of risk.
Phase 1. Specify Needs Phase
This phase is triggered when a need for new statistics is identified, or feedback about current statistics initiates a review. It includes all activities associated with engaging customersFootnote 6 to identify their detailed statistical needs, proposing high level solution options and preparing business cases to meet these needs.
Limited personal information is collected on those persons who participate in consultations, and is limited to that needed to conduct the consultation.
Applicable TRA Grid: AA.
Phase 2. Design Phase
This phase describes the development and design activities, and any associated practical research work needed to define the statistical outputs, concepts, methodologies, collection instruments and operational processes. It includes all the design elements needed to define or refine the statistical products or services identified in the business case. This phase specifies all relevant metadataFootnote 7, ready for use later in the statistical business process, as well as quality assurance procedures. For statistical outputs produced on a regular basis, this phase usually occurs for the first iteration, and whenever improvement actions are identified in the Evaluate phase of a previous iteration.
Personal information is not used in Phase 2.
Phase 3. Build Phase
This phase builds and tests the production solution to the point where it is ready for use in the "live" environment. The outputs of the "Design" phase direct the selection of reusable processes, instruments, information, and services that are assembled and configured in this phase to create the complete operational environment to run the process. New services are built by exception, created in response to gaps in the existing catalogue of services sourced from within the organisation and externally. These new services are constructed to be broadly reusable within the statistical production architecture.
Phase 3.1. Build collection instrument
This sub-process describes the activities to build the collection instruments to be used during the "Collect" phase. The collection instrument is generated or built based on the design specifications created during the "Design" phase.
This sub-process does not itself involve personal information. However, since the collection instrument may collect and store personal information, its development or re-engineering might require a PIA. The privacy risks will be covered below in phase 4.
Phase 3.2. Build or enhance process components
This sub-process describes the activities to build new and enhance existing components and services needed for the "Process" and "Analyse" phases, as designed in the "Design" phase.
This sub-process does not itself involve personal information. However, since the processing systems may collect and store personal information, its development or re-engineering might require a PIA. The privacy risks will be covered in phase 5.
Phase 3.3. Build or enhance dissemination components
This sub-process describes the activities to build new and enhance existing components and services needed for the dissemination of statistical products as designed in process 2.
This sub-process does not itself involve personal information. However, since the dissemination systems may collect and store personal information, its development or re-engineering might require a PIA. The privacy risks will be covered in phase 7.
Phase 3.4. Configure workflows
This sub-process configures the workflow, systems and transformations used within the statistical business processes, from data collection through to dissemination. It ensures that the workflow specified in process 2 works in practice.
This sub-process does not involve personal information.
Phase 3.5. Test production system
This sub-process is concerned with the testing of assembled and configured services and related workflows. It includes technical testing and sign-off of new programmes and routines, as well as confirmation that existing routines from other statistical business processes are suitable for use in this case.
This sub-process, which may involve testing of systems using synthetic or manipulated data, contains the same privacy risks as the complete statistical program. To avoid redundancy, these privacy risks are not included in this section.
Phase 3.6. Test statistical business process
This sub-process describes the activities to manage a field test or pilot of the statistical business process.
As a test, this sub-process contains the same privacy risks as the complete statistical program. To avoid redundancy, these privacy risks are not included in this section.
Phase 3.7. Finalise production systems
This sub-process includes the activities to put the assembled and configured processes and services, including modified and newly-created services into production ready for use by business areas.
This sub-process does not involve personal information.
Phase 4. Collect Phase
This phase collects or gathers all necessary information (data and metadata), using different collection modes (including extractions from statistical, administrative and other non-statistical registers and databases)Footnote 8, and loads them into the appropriate environment for further processing.
Phase 4.1. Create frame and select sample
This sub-process establishes the frame and selects the sample for this iteration of the collection, as specified in sub-process 2.4 Design frame and sample. The frame may include personal information, when direct sampling of individuals is involved. Household frames may also involve personal information when sampling is based on personal information of household members, such as sampling households with persons within certain age groups.
For most statistical programs, this sub-process is conducted entirely within the head offices of Statistics Canada, usually by staff of the Methodology Branch. In some cases, the sample is selected by the field staff.
Applicable TRA Grid: A.
Phase 4.2. Set up collection
This sub-process ensures that the people, processes and technology are ready to collect data and metadata, in all modes as designed. It takes place over a period of time, as it includes the strategy, planning and training activities in preparation for the specific instance of the statistical business process. Where the process is repeated regularly, some (or all) of these activities may not be explicitly required for each iteration. For one-off and new processes, these activities can be lengthy. This sub-process includes:
- preparing a collection strategy;
- training collection staff;
- ensuring collection resources are available e.g. laptops;
- agreeing to terms with any intermediate collection bodies, e.g. sub-contractorsFootnote 9 for computer assisted telephone interviewing;
- configuring collection systems to request and receive the data;
- ensuring the security of data to be collected;
- preparing collection instruments (e.g. setting up electronic questionnaires, pre-filling them with existing data, loading questionnaires, and data onto interviewers' computers, printing questionnaires etc.).
For non-survey sources, this sub-process will include ensuring that the appropriate processes, systems and confidentiality procedures are in place, to receive or extract the necessary information from the source.
This sub-process may include personal information of collection staff and may also include personal information of respondents when questionnaires contain respondent contact information (name, address, etc.) or are pre-filled with data already collected for specific respondents.
Applicable TRA Grids: B, E, F, G, H, I.
Phase 4.3. Run collection
This sub-process is where the collection is implemented, with the different instruments being used to collect or gather the information, which may include raw micro-data or aggregates produced at the source, as well as any associated metadata. It includes the initial contact with data providers and any subsequent follow-up or reminder actions. It may include manual data entry at the point of contact, or fieldwork management, depending on the source and collection mode. It records when and how data providers were contacted, and whether they have responded. This sub-process also includes the management of the data providers involved in the current collection, ensuring that the relationship between the statistical organisation and data providers remains positive, and recording and responding to comments, queries and complaints.
When the collection meets its targetsFootnote 10, it is closed and a report on the collection is produced. Some basic validation of the structure and integrity of the information received may take place within this sub-process, e.g. checking that files are in the right format and contain the expected fields. All validation of the content takes place in the Process phase.
Statistics Canada has several different approaches to data collection. For any particular data collection, one or more choices are selected to achieve the best balance of low costs, low response burden and privacy invasion, and high data quality.
The following represents Statistics Canada's various approaches to data collection directly from individuals:
- Mail-out / Mail-back (self-enumeration);
- e-Questionnaire Service;
- Computer-Assisted Personal Interviewing (CAPI);
- Computer-Assisted Telephone Interviewing (CATI) - Decentralized;
- Computer-Assisted Telephone Interviewing (CATI) - Centralized;
- Paper and Pencil Interviewing (PAPI);
- Collection of Human Biometrics and Biological Specimens;
- Collection of Information through the use of Monitoring Devices;
- Use of the E-file transfer service by a business to transmit its information in addition to or in place of information provided on a questionnaire;
- Obtain records for a specific business (e.g., financial statements) from that business in addition to or in place of information provided on a questionnaire [These would be documents already prepared for other purposes and Statistics Canada would extract the information it requires. The documents may be obtained on paper, by e-mail or from a web site.]
For more details, see Appendix 3.
An alternative approach to collecting directly from individuals is to use administrative records produced by another organization for their own uses (i.e., information on other individuals). This approach is usually a lower cost approach to direct collection, represents no additional response burden to the individuals whose information is involved, and is used whenever possible if the data quality of the administrative records is sufficiently high for Statistics Canada's use in its statistical programs. In most cases, it is necessary for the other organization to transmit the information to Statistics Canada; however, in certain cases, the information is available already on the organization's web site and Statistics Canada may obtain it directly.
For administrative sources, this process is brief: the provider is either contacted to send the information, or sends it as scheduled.
Applicable TRA Grids: B, C, D, E, F, G, H, I, J, K, L, M, P, Q, R (see Section 6 below).
For the use of administrative records, Statistics Canada uses its E-File Transfer Service, unless the providing organization decides to use another approach.
Applicable TRA Grids: N, O, S (see Section 6 below).
Phase 4.4. Finalize collection
This sub-process includes loading the collected data and metadata into a suitable electronic environment for further processing. It may include manual or automatic data input, for example using clerical staff or optical character recognition tools to extract information from paper questionnaires, or converting the formats of files received from other organisations. It may also include analysis of the process metadata (paradata) associated with collection to ensure the collection activities have met requirements. In cases where there is a physical collection instrument, such as a paper questionnaire, which is not needed for further processing, this sub-process manages the archiving of that material.
In this sub-process, personal information is stored, accessed and maintained in Statistics Canada Head Office.
Applicable TRA Grid: A.
Phase 5. Process Phase
This phase describes the cleaning of data and their preparation for analysis. It is made up of sub-processes that check, clean, and transform input data, so that they can be analysed and disseminated as statistical outputs. It may be repeated several times if necessary. For statistical outputs produced regularly, this phase occurs in each iteration. The sub-processes in this phase can apply to data from both statistical and non-statistical sources (with the possible exception of sub-process 5.6. Calculate weights, which is usually specific to survey data).
The "Process" and "Analyse" phases can be iterative and parallel. Analysis can reveal a broader understanding of the data, which might make it apparent that additional processing is needed. Activities within the "Process" and "Analyse" phases may commence before the "Collect" phase is completed. This enables the compilation of provisional results where timeliness is an important concern for users, and increases the time available for analysis.
The Process phase is broken down into eight sub-processes, which may be sequential, but can also occur in parallel, and can be iterative.
In this phase, personal information is stored, accessed and maintained in Statistics Canada Head Office.
Applicable TRA Grid: A.
Note that an additional TRA grid is referenced in sub-process 5.1.
Phase 5.1. Integrate data
This sub-process integrates data from one or more sources. It is where the results of sub-processes in the "Collect" phase are combined. The input data can be from a mixture of external or internal data sources, and a variety of collection modes, including extracts of administrative data. The result is a set of linked data. Data integration can include:
- combining data from multiple sources, as part of the creation of integrated statistics such as national accounts;
- matching / record linkage routines, with the aim of linking micro or macro data from different sourcesFootnote 11;
- prioritising, when two or more sources contain data for the same variable, with potentially different values.
Data integration may take place at any point in this phase, before or after any of the other sub-processes. There may also be several instances of data integration in any statistical business process. Following integration, depending on data protection requirements, data may be anonymized, that is stripped of identifiers such as name and address, to help to protect confidentiality.
Applicable TRA Grids: A, T.
Phase 5.2. Classify and code
This sub-process classifies and codes the input data. For example, automatic (or clerical) coding routines may assign numeric codes to text responses according to a pre-determined classification scheme.
Applicable TRA Grid: A.
Phase 5.3. Review and validate
This sub-process examines data to try to identify potential problems, errors and discrepancies such as outliers, item non-response and miscoding. It can also be referred to as input data validation. It may be run iteratively, validating data against predefined edit rules, usually in a set order. It may flag data for automatic or manual inspection or editing. Reviewing and validating can apply to data from any type of source, before and after integration. Whilst validation is treated as part of the "Process" phase, in practice, some elements of validation may occur alongside collection activities, particularly for modes such as web collection. Whilst this sub-process is concerned with detection of actual or potential errors, any correction activities that actually change the data are done in sub-process 5.4.
Applicable TRA Grid: A.
Phase 5.4. Edit and impute
Where data are considered incorrect, missing or unreliable, new values may be inserted in this sub-process. The terms editing and imputation cover a variety of methods to do this, often using a rule-based approach. Specific steps typically include:
- the determination of whether to add or change data;
- the selection of the method to be used;
- adding / changing data values;
- writing the new data values back to the data set, and flagging them as changed;
- the production of metadata on the editing and imputation process.
Applicable TRA Grid: A.
Phase 5.5. Derive new variables and units
This sub-process derives data for variables and units that are not explicitly provided in the collection, but are needed to deliver the required outputs. It derives new variables by applying arithmetic formulae to one or more of the variables that are already present in the dataset, or applying different model assumptions. This activity may need to be iterative, as some derived variables may themselves be based on other derived variables. It is therefore important to ensure that variables are derived in the correct order. New units may be derived by aggregating or splitting data for collection units, or by various other estimation methods. Examples include deriving households where the collection units are persons, or enterprises where the collection units are legal units.
Applicable TRA Grid: A.
Phase 5.6. Calculate weights
This sub-process creates weights for unit data records according to the methodology created in sub-process 2.5 (Design processing and analysis). In the case of sample surveys, weights can be used to "gross-up" results to make them representative of the target population, or to adjust for non-response in total enumerations. In other situations, variables may need weighting for normalisation purposes.
Applicable TRA Grid: A.
Phase 5.7. Calculate aggregates
This sub-process creates aggregate data and population totals from micro-data or lower-level aggregates. It includes summing data for records sharing certain characteristics, determining measures of average and dispersion, and applying weights from sub-process 5.6 to derive appropriate totals. In the case of sample surveys, sampling errors may also be calculated in this sub-process, and associated to the relevant aggregates.
Applicable TRA Grid: A.
Phase 5.8. Finalize data files
This sub-process brings together the results of the other sub-processes in this phase and results in a data file (usually of macro-data), which is used as the input to the "Analyse" phase. Sometimes this may be an intermediate rather than a final file, particularly for business processes where there are strong time pressures, and a requirement to produce both preliminary and final estimates.
Applicable TRA Grid: A.
Phase 6. Analyze Phase
In this phase, statistical outputs are produced, examined in detail and made ready for dissemination. It includes preparing statistical content (including commentary, technical notes, etc.), and ensuring outputs are "fit for purpose" prior to dissemination to customers. This phase also includes the sub-processes and activities that enable statistical analysts to understand the statistics produced. For statistical outputs produced regularly, this phase occurs in every iteration. The "Analyse" phase and sub-processes are generic for all statistical outputs, regardless of how the data were sourced.
The "Analyze" phase is broken down into five sub-processes, which are generally sequential, but can also occur in parallel, and can be iterative. The sub-processes are:
Phase 6.1. Prepare draft outputs
This sub-process is where the data are transformed into statistical outputs. It includes the production of additional measurements such as indices, trends or seasonally adjusted series, as well as the recording of quality characteristics.
Applicable TRA Grid: Z.
Phase 6.2. Validate outputs
This sub-process is where statisticians validate the quality of the outputs produced, in accordance with a general quality framework and with expectations. This sub-process also includes activities involved with the gathering of intelligence, with the cumulative effect of building up a body of knowledge about a specific statistical domain. This knowledge is then applied to the current collection, in the current environment, to identify any divergence from expectations and to allow informed analyses. Validation activities can include:
- checking that the population coverage and response rates are as required;
- comparing the statistics with previous cycles (if applicable);
- checking that the associated metadata and paradata (process metadata) are present and in line with expectations;
- confronting the statistics against other relevant data (both internal and external);
- investigating inconsistencies in the statistics;
- performing macro editingFootnote 12;
- validating the statistics against expectations and domain intelligence.
Applicable TRA Grids: Y, Z.
Phase 6.3. Interpret and explain outputs
This sub-process is where the in-depth understanding of the outputs is gained by statisticians. They use that understanding to interpret and explain the statistics produced for this cycle by assessing how well the statistics reflect their initial expectations, viewing the statistics from all perspectives using different tools and media, and carrying out in-depth statistical analyses.
There is no access to personal information in this sub-process.
Phase 6.4. Apply disclosure control
This sub-process ensures that the data (and metadata) to be disseminated do not breach the appropriate rules on confidentiality. This may include checks for primary and secondary disclosure, as well as the application of data suppression or perturbation techniques. The degree and method of disclosure control may vary for different types of outputs, for example the approach used for micro-data sets for research purposes will be different to that for published tables or maps.
Applicable TRA Grid: X, Y, Z.
Phase 6.5. Finalize outputs
This sub-process ensures the statistics and associated information are fit for purpose and reach the required quality level, and are thus ready for use. It includes:
- completing consistency checks;
- determining the level of release, and applying caveats;
- collating supporting information, including interpretation, commentary, technical notes, briefings, measures of uncertainty and any other necessary metadata;
- producing the supporting internal documents;
- pre-release discussion with appropriate internal subject matter experts;
- approving the statistical content for release.
Applicable TRA Grid: X, Y, Z.
Phase 7. Disseminate Phase
This phase manages the release of the statistical products to customers. It includes all activities associated with assembling and releasing a range of static and dynamic products via a range of channels. These activities support customers to access and use the outputs released by the statistical organisation.
For statistical outputs produced regularly, this phase occurs in each iteration. It is made up of five sub-processes, which are generally sequential, but can also occur in parallel, and can be iterative. These sub-processes are:
Phase 7.1. Update output systems
This sub-process manages the update of systems where data and metadata are stored ready for dissemination purposes, including:
- formatting data and metadata ready to be put into output databases;
- loading data and metadata into output databases;
- ensuring data are linked to the relevant metadata.
Formatting, loading and linking of metadata should preferably mostly take place in earlier phases, but this sub-process includes a final check that all of the necessary metadata are in place ready for dissemination.
Applicable TRA Grid: Z.
Phase 7.2. Produce dissemination products
This sub-process produces the products, as previously designed (in sub-process 2.1), to meet user needs.Footnote 13 They could include printed publications, press releases and web sites. The products can take many forms including interactive graphics, tables, public-use micro-data sets and downloadable files. Typical steps include:
- preparing the product components (explanatory text, tables, charts, quality statements etc.);
- assembling the components into products;
- editing the products and checking that they meet publication standards.
Applicable TRA Grids: X, Y, Z.
Phase 7.3. Manage release of dissemination products
This sub-process ensures that all elements for the release are in place including managing the timing of the release. It includes briefings for specific groups such as the press or ministers, as well as the arrangements for any pre-release embargoes. It also includes the provision of products to subscribers, and managing access to confidential data by authorised user groups, such as researchersFootnote 14. Sometimes an organisation may need to retract a product, for example if an error is discovered. This is also included in this sub-process.
Applicable TRA Grids: U, V, X, Y, Z.
Phase 7.4. Promote dissemination products
While marketing in general can be considered to be an over-arching process, this sub-process concerns the active promotion of the statistical products produced in a specific statistical business process, to help them reach the widest possible audience. It includes the use of customer relationship management tools, to better target potential users of the products, as well as the use of tools including web sites, wikisFootnote 15 and blogs to facilitate the process of communicating statistical information to users.
Applicable TRA Grid: AA.
Phase 7.5. Manage user support
This sub-process ensures that customer queries and requests for services such as micro-data access are recorded, and that responses are provided within agreed deadlines. These queries and requests should be regularly reviewed to provide an input to the over-arching quality management process, as they can indicate new or changing user needs.
Personal information on clients is used in this sub-process.
Applicable TRA Grid: AA.
Phase 8. Evaluate Phase
This phase manages the evaluation of a specific instance of a statistical business process (as opposed to the more general over-arching process of statistical quality management). It logically takes place at the end of the instance of the process, but relies on inputs gathered throughout the different phases. It includes evaluating the success of a specific instance of the statistical business process, drawing on a range of quantitative and qualitative inputs, and identifying and prioritising potential improvements.
Personal information is generally not used in Phase 8. If such tools as consultation and focus groups are used, some information on participants would be collected and used.
Applicable TRA Grid: AA.