Supplement to Statistics Canada's Generic Privacy Impact Assessment related to the Longitudinal Immigration Database

Date: August 2021

Program manager: Director of Diversity and Sociocultural Statistics
Director General of Health, Justice, Diversity and Populations Branch

Reference to Personal Information Bank (PIB):

Statistics Canada's institutional personal information bank "Longitudinal Immigration Database" (StatCan PPU 135 Longitudinal Immigration Database - Privacy impact assessment) has been updated and is being submitted for review and re-registration. See Appendix 1.

Description of statistical activity:

The Longitudinal Immigration Database (IMDB) was implemented in 1997, and integrates immigration and citizenship data provided by Immigration, Refugees and Citizenship Canada (IRCC) with tax information provided by the Canada Revenue Agency (CRA)Footnote 1. It is used for statistical research on the socioeconomic performance of non-permanent residents and immigrants in Canada, and supports public policy development on population migration, cultural diversity and the challenges of immigrant integration. The IMDB is supported by an Immigration, Refugees and Citizenship Canada (IRCC)-led consortium that includes representatives from all provincial governments. Statistics Canada created and updates the database under the authority of the Statistics Act.Footnote 2 Personal information in the IMDB includes sex, date of birth and death, country of birth, citizenship date, date of admission to Canada, immigration category, education, work history and income. Dates in the IMDB microdata files include only month and year. One exception is for non-permanent resident permits, where the start and end dates of permits are included to allow the calculation of permit durations.

A Privacy Impact Assessment for the IMDB was developed and approved in May 2012. It is now covered by Statistics Canada's Generic Privacy Impact Assessment approved in January 2016.

The IMDB originally only included permanent resident data for immigrants admitted since 1980, and did not include information on non-permanent residents. With this update to the IMDB, coverage has been expanded to include immigrants admitted since 1952, and non-permanent residents. The original IRCC source files of the IMDB include the Permanent Resident Permit File, Express Entry Data, Study Permit File, Employment Permit File, Temporary Resident Permits File, the Refugee Claimant File, the Settlement Services Files, the Newcomers Outcome Survey, Sponsorship files, and a Citizenship File. They also include selection stream data from provinces about immigrants selected under the Provincial Nominee Program (PNP), which contains data describing the detailed immigration categories of individuals admitted through the PNP. This information is linked to T1 Family Files (T1FF) and T4 supplemental files from the CRA. New source files include settlement services data (e.g.: language training, employment-related services) from IRCC and the provinces, as well as mortality records (date of death), and information on salaries and wages from CRA T4 supplemental files. The PNP selection stream file will also be added to the IMDB. Record of Employment (ROE) information from Employment and Social Development Canada (ESDC) will be used to add the reason for employment termination and date of employment to the IMDB.

Statistics Canada only releases anonymized, aggregated statistical information on immigrants and non-permanent residents. Individuals will not be identifiable in any product disseminated to the public.

For the consortium members, non-confidential aggregate statistical tables on income distribution, interprovincial mobility, industry of employment, and provincial indicators are produced and distributed via the secure E-file Transfer System.

Statistics Canada worked with IRCC and the provinces to determine the policy-driven research priorities, and these discussions have been used to inform which variables are included and which are not required in the development and expansion of the IMDB. Statistics Canada and IRCC are continually working on transparency in the use of administrative data for development of the IMDB, and will continue to ensure the effectiveness of the communications materials, including the annual IMDB technical report, the summary of the IMDB data sources, and approved IMDB-based research projects.Footnote 3

Reason for supplement:

While the Generic Privacy Impact Assessment (PIA) addresses most of the privacy and security risks related to statistical activities conducted by Statistics Canada, this supplement addresses any privacy risks associated with the expansion of the IMDB to include immigrants admitted since 1952, and to non-permanent residents. As is the case with all PIAs, Statistics Canada's privacy framework ensures that elements of privacy protection and privacy controls are documented and applied.

Necessity and Proportionality

The collection and use of personal information for the IMDB can be justified against Statistics Canada's Necessity and Proportionality Framework:

  1. Necessity:

    In 1995, the Social Sciences and Humanities Research Council of Canada and Citizenship and Immigration Canada (now Immigration, Refugees and Citizenship Canada (IRCC)) entered into a partnership to support research and public policy development on population migration, cultural diversity and the challenges of immigrant integration.

    The Social Data Linkage Environment (SDLE)Footnote 4. is used to produce the Longitudinal Immigration Database (IMDB). While names and dates of birth are used for the record linkage and are accessible by a limited number of employees who require access for data processing purposes, no personal identifiers are retained in the analytical files; these variables, as well as IRCC client numbers, are removed from the files before internal dissemination. In addition, through data integration, several statistical programs (e.g.: Canadian Community Health Survey, General Social Survey and the Canadian Housing Statistics program) use IMDB data for data replacement, statistical analyses, indicators and tables to help address a range of policy questions related to the well-being of immigrants and non-permanent residents.

    Immigration is a driver for demographic and economic growth, and as such, the IMDB informs Canadians on key societal issues, and provides indicators of success and areas of improvement for policy planning. The IMDB, by its longitudinal nature, is essential for analysis of immigrants' and non-permanent residents' socioeconomic outcomes in Canada over time. Policy makers use the IMDB to shape immigration policies and allocate budgets to programs, while immigrant service providers and researchers access the IMDB or its statistical products to evaluate and help respond to needs for programs and servicesFootnote 5 . These data enable thorough analyses that support the implementation of better policies to improve immigrant outcomes in Canada.

    Non-permanent resident (e.g. international student and foreign worker) data were added to the IMDB as they are often excluded from survey samples, and in recent years, more individuals have been granted non-permanent residency and an increasing proportion of them have been transitioning to permanent residency. Several programs to encourage this transition have been put in place, and there is a need to examine the permanent residency transition rates, the profile of those transitioning, and how the outcome of immigrants with pre-admission experience in Canada differs from immigrants without pre-admission experience.

    Wages and salaries from T4 supplemental files were added to improve the coverage and timeliness of the IMDB, as some foreign workers do not produce T1 records and, as a result, their economic outcomes were not previously included in the IMDB. T4 records are issued for all workers, while T1 records are available for most, but not all individuals with T4 records; some foreign workers with shorter stays do not file a T1. Additionally, it is possible for the CRA to release wage information via the T4 supplemental files one year earlier than is possible with T1FF records.

    Record of Employment (ROE) data provide the reason for termination of employment, which was not previously included in the IMDB. As these data are released one year earlier than the T1FF records, Policymakers will be able to be more responsive regarding employment services for immigrants. These data will also facilitate the evaluation of the COVID-19 pandemic on immigrants' employment.

    Express Entry data contains information for economic immigrants on the management of their invitations to apply for Canadian permanent residency since 2015. These data will enable analysis of short- and long-term socioeconomic outcomes of immigrants selected through the Express Entry system, and allow policy makers to assess the efficiency of the system and respond accordingly.

    Provincial settlement services data will further contribute to the federal settlement services data recently added to the IMDB, and will allow for a better assessment of services provided to newcomers. The evaluation of newcomers' outcomes in Canada in relation to the services they receive will allow service providers and policy makers to improve the services provided to newcomers upon admission to Canada.

    Sponsorship data for refugee and family sponsored immigrants will provide information about the sponsors. By adding data from the Sponsorship program, policies around family reunification can be evaluated and improved. This will enable assessments of the successes and shortcomings of the program in order to provide policy solutions.

    IRCC's Newcomer Outcomes survey and other settlement-related IRCC survey data will provide information not available through administrative data. For example, these data will enable an assessment of perception and social outcome in relation to economic outcome.

  2. Effectiveness - Working assumptions:

    The Longitudinal Immigration Database's purpose is to offer more accurate information than could be obtained from surveys, owing to the size and geographical distribution of the population of interest and the technical nature of some of the key variables. For example, immigration categories and detailed sources of income are both potential sources of error in survey data because individuals might not accurately recall the information. Further, the use of administrative data reduces the burden on Canadian immigrants. For instance, the Longitudinal Administrative Databank (LAD)Footnote 6 uses IMDB data to include immigration characteristics in its program, which allows for comparative analysis of immigrant and non-immigrant socioeconomic outcomes. Additionally, through data integration with the IMDB, the Census is able to add information variables such as immigration category, applicant type, type of pre-admission experience and intended destination province to their final datasets without needing to add specific questions to the Census, which would further burden respondents by asking for the same information again.

    The IMDB was designed to provide detailed and reliable data on the performance and impact of immigration programs. Being a longitudinal database of immigrants and non-permanent residents (both taxfilers and non-taxfilers alike), the IMDB can be used in both broad and specific research. Its major strength is that it allows for the analysis of socioeconomic outcomes over a period of time long enough to assess the impacts of immigrant characteristics on admission. This includes admission category, education, and knowledge of French or English. Further, in recent years, several immigration programs aimed at attracting immigrants to smaller regions and the Atlantic Provinces have been put in place, demonstrating that there is a need to assess the outcomes of immigrants settling in these regions. Moreover, annual information on place of residence allows for the investigation of secondary migration (immigrants' subsequent relocation within Canada). Using the IMDB, these detailed longitudinal analyses can be effectively performed at sub-provincial levels.

    The analytical dataset available to approved researchers (as 'deemed employees'Footnote 7) through the Statistics Canada Research Data CentersFootnote 8 expands access and research opportunities for this rich information, and further enables new research projects among stakeholders and beyond. To further demonstrate effectiveness, the IMDB has a technical report, accessible onlineFootnote 9, which includes details about the coverage, linkage rates and data quality, and includes additional similar reports for each module (e.g.: Wages and salaries; Settlement services). The IMDB program also demonstrates effectiveness through the public dissemination of data tables, analytical articles, and interactive applications on several topics related to immigrants and their socioeconomic outcomes.

  3. Proportionality:

    Proportionality has also been considered based on data sensitivity and ethics.

    Any use of personal information implies some level of perceived intrusion and requires careful management. As illustrated below under Mitigation Factors, the methods and practices behind the IMDB have been designed to ensure protection of privacy and personal information, while filling data gaps.

    In addition to filling data gaps, the development of the IMDB allows for additional research opportunities, using the core dataset to enrich and expand analytical opportunities to better inform public policy and research.

    In most cases, the most privacy intrusive data are baseline information (e.g. date of claim, country of birth, postal code) for refugee claimants, as there are risks associated with being identified (especially if they live in a sparsely populated postal code). For example, refugee claimants are often fleeing persecution or seeking protection, and their identity and location (in addition to financial details) may jeopardize their safety. However, this baseline information is also the most important information for the IMDB, as it is necessary to perform robust longitudinal data integration. After the application of Statistics Canada's dissemination rules, this privacy intrusion and risk of reidentification is deemed minimal, and will lead to the provision of better services to all immigrants (including refugees) and the creation of policies that improve the settlement process, ultimately benefitting all of Canadian society. Policy makers access the analytical and statistical products from the IMDB to shape new policies and programs. Likewise, immigrant service providers access the data to determine the services required to help immigrants settle in Canada successfully.

    Additionally, tax microdata provides detailed industry of employment information and immigration data that identify source countries and year of immigration. In some cases, postal code from tax microdata (the lowest level of geography on the database) can be a low enough level of geography to identify immigrants with unique characteristics. However, sub-provincial data are often required to respond to the needs of immigrants at their place of residence (to more effectively guide local policy & support services), and any dissemination is subject to Statistics Canada suppression rules which minimize the chance of such reidentification. Moreover, immigrant outcomes vary depending on the place of residence, time since admission, and pre-admission characteristics, so the use of this personal information is considered proportional to the public good resulting from its use.

  4. Alternatives:

    No other data source allows for the same level of detailed analysis. The Longitudinal Immigration Database offers more accurate information than could be obtained from surveys, owing to the size and geographical distribution of the population of interest and the technical nature of some of the key variables (e.g., immigration categories, detailed sources of income), which can be potential sources of error in survey data. For example, individuals might not recall many of the details of their admission to Canada, which are critical to some analysis. Detailed immigration categories might not be known, year of admission can be mixed with year of arrival, and these are only some examples of key analytical components that are better served by administrative data. Statistics Canada also has longstanding evidence that response rates to longitudinal surveys decline considerably over time, introducing bias, substantially reducing quality and accuracy, and increasing costs. For these reasons, most longitudinal surveys have been discontinued.

    This use of administrative data provides a level of detail and accuracy that cannot be collected via surveys. Moreover, the respondents' burden is minimal as the data come from administrative sources.

Mitigation factors:

The overall risk of harm to the individuals whose information resides in the IMDB has been deemed manageable with existing Statistics Canada safeguards that are described in Statistics Canada's Generic Privacy Impact Assessment. These include, but are not limited to:

Collection

The information is transmitted electronically to Statistics Canada using a secure electronic file transfer protocol.

Storage & Processing

Information is safeguarded with access restricted to employees who demonstrate a valid requirement to access the data. Identity and roles-based access management controls are in place throughout in support of least-privilege and need-to-know principles. Furthermore, all access permissions are only applicable for a set duration of time and must be regularly renewed with justification and re-approval. Cloud-based data is fully encrypted at rest and in transit as required by GC policy, and encryption keys are managed by the Government of Canada to ensure that only authorized users can decrypt the data. Statistics Canada's cloud implementation aligns with Treasury Board of Canada Secretariat (TBS) cloud services direction including risk-management for cloud-based services and recommended controls for cloud-based services.

After initial processing, a statistical identifier is generated by Statistics Canada to facilitate data integration. As per standard practice, following linkages with other sources of information, data is stripped of direct identifiers such as name and address, to help protect privacy and confidentiality.

Access

Access to any confidential data held by Statistics Canada is closely monitored. For information with personal identifiers, only a limited number of employees with a work-related need-to-know are allowed access.

Analytical IMDB files used within the agency do not include direct identifiers such as names or immigrants' confidential client number from IRCC; these variables are removed from the files before internal dissemination.

The analytical dataset is available to approved researchers (as 'deemed employees') through the Statistics Canada Research Data Centers. This access is only granted after a successful security screening and on a need-to-know basis.

Dissemination

The Statistics ActFootnote 10. provides the legal basis for maintaining the confidentiality of personal information that Statistics Canada collects. Statistics Canada will not disclose confidential information to any third party, other than with the permission of the original data provider and the authorization from the Chief Statistician, as required by the Statistics Act.

Statistics Canada will publish only aggregated statistical information or anonymized public use microdata files as part of its general dissemination strategy.

Openness:

This supplement to Statistics Canada's Generic PIA will be publicly available on the Statistics Canada website.

Conclusion:

This assessment concludes that, with the existing Statistics Canada safeguards any remaining risks are such that Statistics Canada is prepared to accept and manage the risk.

Formal approval:

This Supplementary Privacy Impact Assessment has been reviewed and recommended for approval by Statistics Canada's Chief Privacy Officer, Director General for Modern Statistical Methods and Data Science, and Assistant Chief Statistician for Social, Health and Labour Statistics.

Pierre Desrochers
Chief Privacy Officer

Eric Rancourt
Director General
Modern Statistical Methods and Data Science

Lynn Barr-Telford
Assistant Chief Statistician,
Social, Health and Labour Statistics

The Chief Statistician of Canada has the authority for section 10 of the Privacy Act for Statistics Canada, and is responsible for the Agency's operations, including the program area mentioned in this Supplementary Privacy Impact Assessment.

This Privacy Impact Assessment has been approved by the Chief Statistician of Canada.

Anil Arora
Chief Statistician of Canada

Appendix 1 – Updated Personal Information Bank

Longitudinal Immigration Database

Description: This bank describes information acquired from administrative immigration and tax files that are used for statistical research on the economic performance of immigrants and non-permanent residents in Canada. Source files include the landing file, temporary resident file, refugee claimant file and settlement services files from Immigration, Refugees and Citizenship Canada (IRCC) and information from the Government of Quebec on immigrants to selected by that province. This information is linked to T1 Family Files and T4 Files based on taxation information from the Canada Revenue Agency. Personal information may include name, demographic information such as sex, date of birth and death, country of birth, citizenship, date of arrival in Canada, immigrant class, education, work history and income.

Class of Individuals: Non-permanent residents and immigrants and their families.

Purpose: The Longitudinal Immigration Database (IMDB) is a comprehensive source of data on the economic behaviour of the immigrants and non-permanent residents in Canada and provides insight on their economic performance of immigrants and interprovincial mobility since 1982 and the impact of immigration policy. The IMDB is managed by Statistics Canada on behalf of a federal-provincial consortium led by IRCC. The personal information is used to produce aggregate, non-identifiable data on the immigration population over time. Personal information is collected pursuant to the Statistics Act (Sections 3, 13, 24).

Consistent Uses: The information from the IMDB files may be combined with other administrative data records and with survey responses, including but not limited to are added to the records of immigrant taxfilers on the Longitudinal Administrative Databank (StatCan PPU 112), the General Social Survey (StatCan PPU 155) and the Registered Apprenticeship Information System (StatCan PPU 083) in order to reduce response burden and enhance data.

IMDB tax records are derived from the T1 Family File (StatCan PPU 111) of immigrants and non-permanent residents.

Retention and Disposal Standards: Information is retained until it is no longer required for statistical purposes and then it is destroyed.

RDA Number: 2007/001
Related Record Number: StatCan HFS 723
TBS Registration: 003726
Bank Number: StatCan PPU 135

The previous version of this Personal Information Bank is available prior to 2021 on Statistics Canada's Information about Programs and Information Holdings website.