Introduction
Under the Statistics Act, Statistics Canada is mandated to collect, compile, analyze, abstract and publish statistical information relating to the commercial, industrial, financial, social, economic and general activities and condition of the people and promote the avoidance of duplication in the information collected by departments of government. The use of administrative data allows Statistics Canada to improve data quality and meet new and ongoing statistical needs, while reducing data collection costs and the response burden on Canadians. The purpose of the Administrative Data Pre-processing Project (ADP) is to support the mandate and modernization of the Census, Regional Services, and Operations Field by centralizing and automating reception, pre-processing, and deidentification activities related to administrative data at Statistics Canada.
Objective
A privacy impact assessment for Administrative Data Pre-processing Project (ADP) was conducted to determine if there were any privacy, confidentiality or security issues with this initiative and, if so, to make recommendations for their resolution or mitigation.
Description
In accordance with data acquisition agreements between Statistics Canada and external Data Providers, Demeter, a by-product of ADP, will pre-process administrative data assets by means of performing a collection of automated activities. As part of these activities, prior to making the data available for analysis, Demeter will deidentify the data by isolating all direct personal information elements from the other microdata contents. Direct personal information elements removed during pre-processing will be stored in an access-restricted secure storage environment for further deidentification activities to be applied by downstream linkage systems; such as the assignment of longitudinal identifiers and/or code-sets.
Introducing efficiencies to mitigate existing gaps allows internal operations to effectively support socioeconomic indicators and empirical based decision making. Though, what remains paramount is that the processes put into effect remain secure and transparent to meet the expectations of Canadians in the wake of an ever growing digitally driven society and economy. Modernizing our infrastructure allows us to revisit the ways in which we approach the handling and storage of direct personal information elements, ensuring that the privacy of Canadians is protected and uncompromised. The operationalization of modern infrastructure, such as ADP, allows for Statistics Canada to remain a trustworthy source of national statistical information for the benefit of all Canadians.
Given the nature of administrative data contents, direct personal information elements are often present and require deidentification. In the event that an administrative data reception includes personal information, deemed employees must consult with the respective Data Steward to identify variables within the schema for deidentification. Internal data availability and usage cannot continue until the respective schema has been approved by the Data Steward. Supporting a modernized metadata driven process, all subsequent receptions adhering to the approved schema are automatically deidentified by using metadata captured within the approved schema.
Implementing ADP does not introduce new methods of gathering data from Canadians. Instead, ADP leverages existing secure infrastructure to allow for the reception of administrative data from external administrative data providers into a modernized cloud based system. The use of administrative data allows Statistics Canada to improve data quality and meet new and ongoing statistical needs, while reducing data collection costs and the response burden on Canadians. Supporting the reception of administrative data, ADP further reduces operational costs by automating ingestion and validation activities. All data ingested by Demeter are transitory, in that microdata is not stored within the system following the successful completion of pre-processing activities.
Risk Area Identification and Categorization
The PIA identifies the level of potential risk (level 1 is the lowest level of potential risk and level 4 is the highest) associated with the following risk areas:
a) Type of program or activity
Program or activity that does not involve a decision about an identifiable individual.
Risk scale: 1
b) Type of personal information involved and context
Social Insurance Number, medical, financial or other sensitive personal information or the context surrounding the personal information is sensitive; personal information of minors or of legally incompetent individuals or involving a representative acting on behalf of the individual.
Risk scale: 3
c) Program or activity partners and private sector involvement
Private sector organizations, international organizations or foreign governments.
Risk scale: 4
d) Duration of the program or activity
Long-term program or activity.
Risk scale: 3
e) Program population
The program’s use of personal information is not for administrative purposes. Information is collected for statistical purposes, under the authority of the Statistics Act.
Risk scale: N/A
f) Personal information transmission,
The personal information is transmitted using wireless technologies.
Risk scale: 4
g) Technology and privacy
In the event that an administrative data reception includes direct personal information elements, delegated Data Stewards must identify variables within the schema for deidentification. Applying a metadata driven approach allows for each reception thereafter, by default and without exception, to be deidentified through the use of metadata captured within the approved schema. Introducing this new process to support the collection and handling of direct personal information ensures that the privacy of Canadians remains at the forefront by limiting the presence and circulation of personal information to the reception and deidentification stages.
h) Potential risk that in the event of a privacy breach, there will be an impact on the individual or employee.
There is a potential risk that, in the event of a privacy breach, there would be an impact on the individual or employee. While microdata is not stored in Demeter in perpetuity, transitory microdata containing direct personal information elements is stored up until deidentification has occurred. Should the process fail, such microdata is accessible by a limited number of deemed employees in order to resolve the issue. The potential risk of a privacy breach is significantly reduced in comparison to the traditional methods of administrative data pre-processing.
i) Potential risk that in the event of a privacy breach, there will be an impact on the institution.
Yes, there is a potential risk that, in the event of a privacy breach, there would be a reduction in the public’s trust of the institution, an impact on the participation of critical Data Providers, and the Agency’s ability to provide Canadians with essential socioeconomic and statistical measures.
Conclusion
This assessment of the Administrative Data Pre-processing Project (ADP) did not identify any privacy risks that cannot be managed using existing safeguards.