Employee Wellness Surveys - Privacy impact assessment summary

Introduction

This privacy impact assessment (PIA) is to assess the privacy impact of the Employee Wellness Surveys (EWS) and associated Pulse check Surveys, which will operate under the Financial Administration Act and link self-response data (from the internally administered online survey) to existing administrative databases.

Objective

A privacy impact assessment for EWS and associated Pulse check Surveys was conducted to determine if there were any privacy, confidentiality or security issues with this initiative and, if so, to make recommendations for their resolution or mitigation.

Description

The HR Business Intelligence, Wellness, and Transformation Division at Statistics Canada has the mandate of developing robust EWS and associated Pulse Check Surveys for internal collection (i.e., the survey will only be administered to Statistics Canada and Statistical Survey Operations employees). These surveys will use valid and published scales (please see project description for specific information on these scales) which will be linked to relevant administrative databases -in order to offer up-to-date and representative measurement of the state of Statistics Canada’s psychological health and safety, all while reducing unnecessary response burden on participants. These robust and representative data will then inform evidence-based and appropriate interventions, and provide practical insights and recommendations to all levels of management. More specifically, the EWS will allow the Organizational Health team to understand where challenges to psychological health and safety reside, where resources to help bolster psychological health and safety exist, and how to best improve overall psychological health and safety, and ultimately, performance. Since this program is an internal survey, it will be conducted under the Financial Administration Act, and not the Statistics Act.

Risk Area Identification and Categorization

The PIA identifies the level of potential risk (level 1 is the lowest level of potential risk and level 4 is the highest) associated with the following risk areas:

a) Type of program or activity

Program or activity that does not involve a decision about an identifiable individual.

Risk scale: 1

b) Type of personal information involved and context

Personal information, with no contextual sensitivities after the time of collection, provided by the individual with consent to also use personal information held by another source.

Risk scale: 2

c) Program or activity partners and private sector involvement

Within the institution (among one or more programs within the same institution)

Risk scale: 1

d) Duration of the program or activity

Long-term program or activity.

Risk scale: 3

e) Program population

The program's use of personal information for internal administrative purposes affects all employees.

Risk scale: 2

f) Personal information transmission,

The personal information is used in a system that has connections to at least one other system.

Risk scale: 2

g) Technology and privacy

All personal information will be kept within Statistics Canada, and on Statistics Canada servers. Because of the security measures already in place, a technology or privacy breach risk is low.

h) Potential risk that in the event of a privacy breach, there will be an impact on the individual or employee.

In the case of a privacy breach, there will be an intermediate impact on employees, as this breach could bring embarrassment and slight discomfort to those affected.

i) Potential risk that in the event of a privacy breach, there will be an impact on the institution.

There should be very low impact to the institution in the event of a privacy breach.

Conclusion

This assessment of the EWS did not identify any privacy risks that cannot be managed using existing safeguards.

Retail Commodity Survey: CVs for Total Sales (March 2022)

Retail Commodity Survey: CVs for Total Sales March 2022
Table summary
This table displays the results of Retail Commodity Survey: CVs for Total Sales (March 2022). The information is grouped by NAPCS-CANADA (appearing as row headers), and Month (appearing as column headers).
NAPCS-CANADA Month
202201 202202 202203
Total commodities, retail trade commissions and miscellaneous services 0.75 0.67 0.62
Retail Services (except commissions) [561] 0.75 0.66 0.62
Food at retail [56111] 0.71 1.27 0.98
Soft drinks and alcoholic beverages, at retail [56112] 0.54 0.57 0.62
Cannabis products, at retail [56113] 0.00 0.00 0.00
Clothing at retail [56121] 1.61 2.14 1.14
Footwear at retail [56122] 1.75 1.75 1.59
Jewellery and watches, luggage and briefcases, at retail [56123] 5.71 5.30 6.80
Home furniture, furnishings, housewares, appliances and electronics, at retail [56131] 2.20 0.88 1.20
Sporting and leisure products (except publications, audio and video recordings, and game software), at retail [56141] 3.06 1.91 2.16
Publications at retail [56142] 6.03 6.88 6.12
Audio and video recordings, and game software, at retail [56143] 0.50 0.44 0.49
Motor vehicles at retail [56151] 2.80 2.21 2.04
Recreational vehicles at retail [56152] 7.32 4.33 4.53
Motor vehicle parts, accessories and supplies, at retail [56153] 2.01 2.06 1.75
Automotive and household fuels, at retail [56161] 1.65 1.43 2.06
Home health products at retail [56171] 2.40 2.34 2.16
Infant care, personal and beauty products, at retail [56172] 2.17 2.24 2.28
Hardware, tools, renovation and lawn and garden products, at retail [56181] 2.93 2.38 2.19
Miscellaneous products at retail [56191] 2.87 2.17 2.14
Total retail trade commissions and miscellaneous services Footnote 1 2.17 1.74 2.04

Footnotes

Footnote 1

Comprises the following North American Product Classification System (NAPCS): 51411, 51412, 53112, 56211, 57111, 58111, 58121, 58122, 58131, 58141, 72332, 833111, 841, 85131 and 851511.

Return to footnote 1 referrer

Data Science and Modern Methods Forum – June 23, 2021

Video - Data Science and Modern Methods Forum – June 23, 2021

The Data Science and Modern Methods Forum is a virtual event series created to inform Statistics Canada employees of the department's ongoing data science projects. This highlight reel from the June 2021 forum event features some of the innovative data science projects our agency has been working on.

Identifying pandemic hubs (health regions)

Video - Identifying pandemic hubs (health regions)

The Data Science and Modern Methods Forum is a virtual event series created to inform Statistics Canada employees of the department's ongoing data science projects. This presentation by Andres Solis Montero, Lead Data Scientist and Saeid Molladavoudi, Senior Data Science Advisor, from the June 2021 forum event features the ways in which they identified pandemic hubs during the COVID-19 pandemic.

2022 Indigenous Peoples Survey

Video - 2022 Indigenous Peoples Survey

This video encourages participation from all who have been selected for the Indigenous Peoples Survey, running from May to October 2022, and January to March 2023.

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography - March 2022

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography - March 2022
Table summary
This table displays the results of CVs for Total sales by Geography. The information is grouped by Geography (appearing as row headers), Month and percentage (appearing as column headers).
Geography Month
202103 202104 202105 202106 202107 202108 202109 202110 202111 202112 202201 202202 202203
percentage
Canada 0.47 0.70 0.70 1.00 3.40 0.43 0.16 0.19 0.18 0.15 0.68 0.67 0.98
Newfoundland and Labrador 1.87 2.01 3.35 0.36 0.45 0.45 0.47 0.52 0.52 0.57 0.98 2.09 1.75
Prince Edward Island 0.75 16.43 0.84 0.68 0.64 0.58 2.75 7.74 7.11 4.93 8.04 10.78 10.51
Nova Scotia 0.83 2.96 3.07 0.85 0.36 0.27 0.30 0.38 0.38 1.13 0.93 0.75 0.91
New Brunswick 0.33 1.01 1.73 0.39 0.42 0.36 0.52 0.49 0.53 1.69 8.61 15.21 1.02
Quebec 1.17 1.08 1.95 3.60 16.19 0.65 0.53 0.59 0.51 0.27 2.15 1.40 2.17
Ontario 0.98 1.45 1.13 1.80 1.16 0.87 0.23 0.25 0.31 0.20 1.19 0.34 1.46
Manitoba 0.38 1.14 2.61 0.62 0.68 0.33 0.35 0.68 0.78 0.50 4.84 0.70 0.82
Saskatchewan 0.38 1.11 0.84 0.55 10.60 0.89 0.76 1.51 1.22 0.74 1.38 1.28 1.39
Alberta 0.80 1.40 2.37 0.44 2.27 0.64 0.37 0.45 0.36 0.74 1.23 2.86 3.01
British Columbia 0.97 1.87 1.43 0.78 1.64 0.32 0.32 0.41 0.33 0.27 1.16 1.99 3.10
Yukon Territory 2.30 64.50 2.58 1.50 2.66 4.71 1.91 2.96 19.04 12.40 2.59 2.44 2.39
Northwest Territories 2.46 72.86 2.96 1.42 2.81 5.63 2.14 3.33 24.74 4.96 3.70 3.26 3.24
Nunavut 2.42 3.43 4.33 1.20 72.94 2.71 3.48 5.52 3.56 2.53 0.65 0.75 0.55

Time use diary information and instructions

Have you received an invitation to participate in the Time Use Survey? On this page, you will find information about the 24-hour time use diary, which is the main focus of the survey. Watch the three videos below to get an overview of the diary, to learn how to report simultaneous activities (doing two or more things at the same time), and to learn how to report travel activities.

How does the diary work and what questions are asked?

Time Use Survey: How does the diary work and what questions are asked?

This video is a tool for respondents who have been selected for the Time Use Survey 2022, and explores how to complete the time diary.

What if I was doing more than one activity at the same time?

Time Use Diary: What if I was doing more than one activity at the same time?

This video explores how to report simultaneous activities within the time diary of the Time Use Survey 2022.

How to report travel activities?

Time Use Survey: Reporting travel activities

This video explores what to do if a travel edit is triggered while completing the time diary of the Time Use Survey 2022.

Why does Statistics Canada need so much detail about my day?

Time use data are used to identify which percentage of Canadians have participated in a given activity, regardless of the length of time it takes. Taking fifteen minutes to help an elderly neighbor with housework may not seem like a significant part of your day because it is a quick, one-time activity, but collecting this information helps identify the percentage of Canadians who provide informal help. It is important to include all these small activities in your time use diary.

What time period does the diary cover?

You will be asked to account for a 24-hour period in the diary portion of the survey. The 24-hour period starts at 4:00 am and ends at 4:00 the next morning. After completing the diary, you will be asked a few more questions about your day and your use of time.

Can I choose which day to report?

No, please report for the day indicated in your invitation. Even if you think a different day is more interesting, it is important that you indicate the activities you did on your requested day. Time use trends vary between different days (e.g., a Wednesday often looks very different than a Sunday). In order for the data to accurately portray how Canadians spend their time, Statistics Canada assigns specific days to the respondents in order to ensure an even distribution of responses on each day.

What if my assigned reference day is not a good representation of my usual routine?

Respondents are asked to report on days that might not always be a good representation of their average day, either due to short-term illness, vacation, or some other event. Although it might seem unrepresentative of a typical day, the information is still very useful.

Will the survey ask about specific personal or private activities?

Data collected from the Time Use Survey are grouped into several broad categories. While you may participate in some private or personal activities during your diary day, you will not be asked to disclose the specific details. For example, activities like taking a shower, brushing teeth or hair, getting dressed, meditating and sexual activities are all reported as « Personal care ». Statistics Canada always applies strict measures to protect the confidentiality of your information and the respect of your privacy.

The survey is not a way for the government to learn about you as an individual.

The data will be used to understand how Canadians as a whole (or by province, age, gender, visible minority status, etc.) use their time. Your responses will always be confidential. For more information on how Statistics Canada protects your data, visit our Trust Centre.

Reference data as a Services (RDaaS) - API User Guide

Produced by: Center for statistical data standards

Table of contents

Revision history

Revision history
Date Version Revision details
November 14, 2024 2.3
  • Corrections to documentation
August 16, 2024 2.2
  • Corrections to documentation
August 15, 2024 2.1
  • Add Index endpoints
July 5, 2024 2.0
  • Remove API key requirement
  • New URL
June 1, 2022 1.0
  • First release of RDaaS User Guide

Purpose of Document

The purpose of this document is to provide users with a guide to the Reference Data as a Service (RDaaS).

Summary

The Reference Data as a Service (RDaaS) is a list of codesets, classifications and concordances that are used within Statistics Canada and shared to help harmonise data for better interdepartmental data integration and analysis. This RDaaS initiative is meant to share Statistics Canada's reference data with everyone that would like to facilitate & harmonise data for analysis between Government of Canada and its data partners.

Access and availability

The Application Programming Interface (API) is accessible to the public. In general, the API is expected to be available 24/7 except during system updates.

The base Universal Resource Locator (URL) of the API is at:

https://api.statcan.gc.ca/rdaas

Technology

Hyper Text Transfer Protocol Secure (HTTPS) is a standard way of communicating across the Web and includes methods such as GET and POST used with the web service.

The implementation is a RESTful (built using representational state transfer [REST] architecture) web data service over HTTPS protocol, which will return data in JSON (JavaScript Object Notation) language.

REST is an architectural style that specifies constraints, such as the uniform interface, that if applied to a web service induce desirable properties, such as performance, scalability and modifiability, which enable services to work best on the Web.

Definitions

These Application Programming Interface (API) will give you full list of codesets in the language asked. Here are the classifications and codesets that are getting included and available for general consumptions: Definitions, data sources and methods (statcan.gc.ca). An example of a classification you can find is Gender where you will be able to download through this API the following information: Classification of gender (statcan.gc.ca).

Application Programming Interface (API) methods and examples:

We list numerous types of methods for the Application Programming Interface (API) here:

Search

RDaaS search functionality provides users with the ability to search for the resources they are looking for based on a variety of parameters. Each type of resource will have an endpoint where the actual search is performed, along with the potential for supplemental endpoints that can be used to retrieve user specific parameters to use in the search. These parameters may change based on the user's role.

Search classifications

Searches and filters classifications. Text searching is applied to the classification names and and descriptions. The applicable filters can be found from the get classification search filters method. This search supports paging, and a offset and limit can be specified.

Search classifications - more information

HTTP request method: GET

Relative URL:  /search/classifications

Parameters:

Name
Description
limit
integer
(query)
the number of results to return; optional, 10 by default, 500 maximum
Default value: 10
q
string
(query)
the search text; optional, omit to search all
start
integer
(query)
the start position for the results returned; optional, 0 (start at first result) by default
Default value: 0
audience
string
(query)
A parameter can be repeated to filter on more than one value at a time.
Note that the values shown here may not be available to all users. See the corresponding /filters method for this to determine which filter values are applicable to you.
Available values: PUBLIC, INTERNAL, SYSTEM, IATD
status
string
(query)
A parameter can be repeated to filter on more than one value at a time.
Note that the values shown here may not be available to all users. See the corresponding /filters method for this to determine which filter values are applicable to you.
Available values: DEFINED, VERIFIED, PENDING, APPROVED, RELEASED, WITHDRAWN, RETIRED, ARCHIVED

Response Code:

Http Status Code Description
200 Classifications retrieved successfully.

Example:
This call will return the first 10 classifications available where you can find the id to be used in other calls.

Call:
https://api.statcan.gc.ca/rdaas/search/classifications?limit=10

Results:

{
"results": {
"@context": "https://api.statcan.gc.ca/rdaas/context/classification-search",
"@graph": [
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/UT1MXA1jXY1fUpI5",
"name": "Accommodations of collective dwellings",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2022-05-02T12:36:44Z",
"validFrom": "2017-03-20T04:00:00Z",
"codeCount": 15,
"levelCount": 2,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/O9Kepe4bA1syfqYh"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/E16EWiM6ykmPDq4k",
"name": "Accommodations of collective dwellings - Collapsed classification",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2022-05-02T12:36:50Z",
"validFrom": "2017-03-20T04:00:00Z",
"codeCount": 6,
"levelCount": 1,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/Kg860zM75ZI07dvh"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/P1PiDASifm2o9oHQ",
"name": "Admission category",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2024-01-09T16:15:28Z",
"validFrom": "2023-10-10T04:00:00Z",
"codeCount": 40,
"levelCount": 4,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/cDazrF4xtw6CMPzK"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/jar887g7uV1OQYov",
"name": "After-tax income",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2024-05-27T17:33:53Z",
"validFrom": "2016-03-21T04:00:00Z",
"codeCount": 27,
"levelCount": 2,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/Q85DCmZ2MYZdcqyo"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/Gnka5DWGAyFWxnLT",
"name": "After-tax income of person",
"abbreviation": null,
"versionNumber": "2.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2022-09-28T10:53:46Z",
"validFrom": "2016-03-21T04:00:00Z",
"codeCount": 27,
"levelCount": 3,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/AJczsftMQ9bhxv1p"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/z003kOtktxHvwlqs",
"name": "Age categories by five-year age groups",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2022-08-12T15:48:17Z",
"validFrom": "2007-05-22T04:00:00Z",
"codeCount": 21,
"levelCount": 1,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/frUSfEXyon94IaZk"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/qeneaaD0bcvKzkhu",
"name": "Agricultural Regions - Variant of SGC",
"abbreviation": null,
"versionNumber": "2016.1.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2023-05-05T18:04:27Z",
"validFrom": "2016-05-16T04:00:00Z",
"codeCount": 7314,
"levelCount": 6,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/gigntSkqxTSeHjlm"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/DhkqJv2iI2ebZju8",
"name": "Apprenticeship certificates",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2024-01-08T18:53:28Z",
"validFrom": "2021-05-26T04:00:00Z",
"codeCount": 2,
"levelCount": 1,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/FvNfg3B25APRSxH4"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/WNtEKGZu4qIJx3Oz",
"name": "Canadian citizenship status",
"abbreviation": null,
"versionNumber": "1.0.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2022-05-02T13:18:45Z",
"validFrom": "2011-04-18T04:00:00Z",
"codeCount": 2,
"levelCount": 1,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/tXaph1UBAf4x5ZwD"
},
{
"@id": "https://api.statcan.gc.ca/rdaas/classification/t4QLjaKVS4JX6CGV",
"name": "Canadian Classification of Institutional Units and Sectors",
"abbreviation": "CCIUS",
"versionNumber": "2012.1.0",
"audience": "STANDARDS",
"status": "RELEASED",
"lastUpdated": "2022-05-02T13:37:16Z",
"validFrom": "2015-06-15T04:00:00Z",
"codeCount": 171,
"levelCount": 6,
"classificationSeries": "https://api.statcan.gc.ca/rdaas/classification-series/zQevgSms7fLn6Yie"
}
]
},
"found": 2395,
"start": 0,
"limit": 10,
"facets": {
"status": {
"RELEASED": 1882,
"RETIRED": 487,
"ARCHIVED": 26
},
"audience": {
"NON_STANDARDIZED": 1912,
"STANDARDS": 483
}
}
}

Classification search filters

This will return a list of filters and the values that can be used on them when searching for classifications. The can be used as query parameters on any search request. For example, if the there is a filter named "bank" with values "A", "B", and "C", query parameters can be added to any search request to filter the results based on these values. For instance, to filter on "bank" "A" and "B", one could append ?bank=A&bank=B to the query url. It is worth noting that these filers will also be included as facet counts in all search results.

Classification search filters - more information

HTTP request method: GET

Relative URL:  /search/classifications/filters

Parameters:
None

Response Code:

Http Status Code Description
200 Classifications retrieved successfully.

Example:
This call will return a list a classification filters that can be used in other calls.

Call:
https://api.statcan.gc.ca/rdaas/search/classifications/filters

Results:

[ {
"parameter" : "status",
"values" : [ "RELEASED", "RETIRED", "ARCHIVED" ]
}, {
"parameter" : "audience",
"values" : [ "STANDARDS", "NON_STANDARDIZED" ]
} ]

Search concordances

Searches and filters concordances. Text searching is applied to the concordance names and and descriptions. The applicable filters can be found from the get concordance search filters method. This search supports paging, and an offset and limit can be specified.

Search concordances - more information

HTTP request method: GET

Relative URL:  /search/concordances

Parameters:

Name
Description
limit
integer
(query)
the number of results to return; optional, 10 by default, 500 maximum
Default value: 10
q
string
(query)
the search text; optional, omit to search all
start
integer
(query)
the start position for the results returned; optional, 0 (start at first result) by default
Default value: 0
audience
string
(query)
A parameter can be repeated to filter on more than one value at a time.
Note that the values shown here may not be available to all users. See the corresponding /filters method for this to determine which filter values are applicable to you.
Available values: PUBLIC, INTERNAL, SYSTEM, IATD
status
string
(query)
A parameter can be repeated to filter on more than one value at a time.
Note that the values shown here may not be available to all users. See the corresponding /filters method for this to determine which filter values are applicable to you.
Available values: DEFINED, VERIFIED, PENDING, APPROVED, RELEASED, WITHDRAWN, RETIRED, ARCHIVED

Response Code:

Http Status Code Description
200 Concordances retrieved successfully.

Example:
This call will return the first 10 concordances available where you can use the id for other calls.

Call:
https://api.statcan.gc.ca/rdaas/search/concordances?limit=10

Results:

{
"results" : {
"@context" : "https://api.statcan.gc.ca/rdaas/context/concordance-search",
"@graph" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/OjYqoufLb12yRhrq",
"name" : "National Occupational Classification V2016.1.0 to V2016.1.1",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-03T13:18:49Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/g8CZeIwUhswtL0Mi",
"source_name" : "National Occupational Classification",
"sourceVersionNumber" : "2016.1.0",
"target" : "https://api.statcan.gc.ca/rdaas/classification/QqdPX82Lk5bKqPul",
"target_name" : "National Occupational Classification",
"targetVersionNumber" : "2016.1.1"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/a2bMQbmThySaqbHO",
"name" : "National Occupational Classification V2016.1.1 to V2016.1.2",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-03T13:22:23Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/QqdPX82Lk5bKqPul",
"source_name" : "National Occupational Classification",
"sourceVersionNumber" : "2016.1.1",
"target" : "https://api.statcan.gc.ca/rdaas/classification/rrIXoQv2XQY7aKLJ",
"target_name" : "National Occupational Classification",
"targetVersionNumber" : "2016.1.2"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/HGJTm7VMSMhwsYEk",
"name" : "National Occupational Classification V2016.1.2 to V2016.1.3",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-03T13:26:06Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/rrIXoQv2XQY7aKLJ",
"source_name" : "National Occupational Classification",
"sourceVersionNumber" : "2016.1.2",
"target" : "https://api.statcan.gc.ca/rdaas/classification/pkba22EgXl2c7apI",
"target_name" : "National Occupational Classification",
"targetVersionNumber" : "2016.1.3"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/Xu7sVCPbxydMH0E2",
"name" : "North American Industry Classification System - Canada V2007.1.0 to V2012.1.0",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2023-05-30T18:14:40Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/rJcCKBjOKGOIe05K",
"source_name" : "North American Industry Classification System - Canada",
"sourceVersionNumber" : "2007.1.0",
"target" : "https://api.statcan.gc.ca/rdaas/classification/P4nBWMtGTR97i4aA",
"target_name" : "North American Industry Classification System - Canada",
"targetVersionNumber" : "2012.1.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/wyFGyCucaNxZn1vl",
"name" : "North American Industry Classification System - Canada V2012.1.0 to V2017.1.0",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-03T13:39:24Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/P4nBWMtGTR97i4aA",
"source_name" : "North American Industry Classification System - Canada",
"sourceVersionNumber" : "2012.1.0",
"target" : "https://api.statcan.gc.ca/rdaas/classification/TGLFzrGej7kGOzM0",
"target_name" : "North American Industry Classification System - Canada",
"targetVersionNumber" : "2017.1.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/gjBOiUvngK15MZg2",
"name" : "North American Industry Classification System - Canada V2017.1.0 to V2017.2.0",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-03T13:41:40Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/TGLFzrGej7kGOzM0",
"source_name" : "North American Industry Classification System - Canada",
"sourceVersionNumber" : "2017.1.0",
"target" : "https://api.statcan.gc.ca/rdaas/classification/bDiwaJL3bjdeZ8TS",
"target_name" : "North American Industry Classification System - Canada",
"targetVersionNumber" : "2017.2.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/qzUQ47PtPMkqPZHj",
"name" : "North American Industry Classification System - Canada V2017.2.0 to V2017.3.0",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-03T13:44:03Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/bDiwaJL3bjdeZ8TS",
"source_name" : "North American Industry Classification System - Canada",
"sourceVersionNumber" : "2017.2.0",
"target" : "https://api.statcan.gc.ca/rdaas/classification/S049Pjk4RIUgw6j2",
"target_name" : "North American Industry Classification System - Canada",
"targetVersionNumber" : "2017.3.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/qEwaBpiTQKKzeGIp",
"name" : "North American Product Classification System - Canada V2012.1.0 to V2012.1.1",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-02T18:57:46Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/X8l4KOP2cmhJzwSH",
"source_name" : "North American Product Classification System - Canada",
"sourceVersionNumber" : "2012.1.0",
"target" : "https://api.statcan.gc.ca/rdaas/classification/oIHxzqfQiLHk9vMc",
"target_name" : "North American Product Classification System - Canada",
"targetVersionNumber" : "2012.1.1"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/UKa2VvehQUt1ZKVt",
"name" : "North American Product Classification System - Canada V2012.1.1 to V2012.1.2",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-02T19:07:08Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/oIHxzqfQiLHk9vMc",
"source_name" : "North American Product Classification System - Canada",
"sourceVersionNumber" : "2012.1.1",
"target" : "https://api.statcan.gc.ca/rdaas/classification/iXY2lQJFVTLMAYK2",
"target_name" : "North American Product Classification System - Canada",
"targetVersionNumber" : "2012.1.2"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/IFg2ANyUAkAB4hcW",
"name" : "North American Product Classification System - Canada V2012.1.2 to V2017.1.0",
"versionNumber" : "1.0.0",
"audience" : "STANDARDS",
"status" : "RELEASED",
"lastUpdated" : "2022-05-02T19:09:49Z",
"source" : "https://api.statcan.gc.ca/rdaas/classification/iXY2lQJFVTLMAYK2",
"source_name" : "North American Product Classification System - Canada",
"sourceVersionNumber" : "2012.1.2",
"target" : "https://api.statcan.gc.ca/rdaas/classification/FojCXY7J9UDmynZZ",
"target_name" : "North American Product Classification System - Canada",
"targetVersionNumber" : "2017.1.0"
} ]
},
"found" : 1200,
"start" : 0,
"limit" : 10,
"facets" : {
"status" : {
"RETIRED" : 1150,
"RELEASED" : 48,
"ARCHIVED" : 2
},
"audience" : {
"NON_STANDARDIZED" : 1183,
"STANDARDS" : 17
}
}
}

Concordance search filters

This will return a list of filters and the values that can be used on them when searching for concordances. The can be used as query parameters on any search request. For example, if the there is a filter named "bank" with values "A", "B", and "C", query parameters can be added to any search request to filter the results based on these values. For instance, to filter on "bank" "A" and "B", one could append ?bank=A&bank=B to the query url. It is worth noting that these filers will also be included as facet counts in all search results.

Concordance search filters - more information

HTTP request method: GET

Relative URL:  /search/concordances/filters

Parameters:
None

Response Code:

Http Status Code Description
200 Concordances retrieved successfully.

Example:
This call will return the list of filters available to be used in other calls.

Call:
https://api.statcan.gc.ca/rdaas/search/concordances/filters

Results:

[ {
"parameter" : "status",
"values" : [ "RELEASED", "RETIRED", "ARCHIVED" ]
}, {
"parameter" : "audience",
"values" : [ "STANDARDS", "NON_STANDARDIZED" ]
} ]

Classifications-codesets

Generally a statistical classification is a way to group a set of related categories in a meaningful, systematic, and standard format. The statistical classification is usually exhaustive, has mutually exclusive and well-described categories, and has either a hierarchical or a flat structure. A statistical classification usually contains codes and descriptors.

This section contains requests that provide information about classifications, their levels and codes, and changes across versions. These resources are available themselves in several different views (typically detailed and summary) along with a few utilities that are designed to provide users with answers to specific questions they may have without requiring them to dig through all the information themselves.

Codes / Categories

Codes are an alphanumeric or numeric value that is associated with a Category. Codes are organized in a classification by levels. In a flat classification there will be a single level that contains all the codes. In a hierarchical classification codes that are stored on the higher levels will be broader while those stored on the lower levels will be more specific refinements of their parents.

Classification detail

Provides detailed information about the specified classification. This includes descriptive information about the classification, version information, levels, and all codes arranged in order and hierarchically (if applicable).

Classification detail - more information

HTTP request method: GET

Relative URL:  /classification/{id}

Parameters:

Name
Description
id *
string
(path)
Identifier of the classification.
lang
string
(query)
The internationalization language tags to request content in specific language(s). These can be comma separated. For example, "lang=en, fr-ca" See /api/i18n/tags under InternationalizationController for more details about language support.
Example: en
method
string
(query)
The internationalization serialization method to use. See /api/i18n/methods under InternationalizationController for more details about language support.
Available values: SINGLE, PROPERTY, ARRAY, CONTAINER
Example: SINGLE

Response Code:

Http Status Code Description
200 Classification retrieved successfully.

Example:
This call will return all information available for the gender classification (lQA3IRH1ER3KXwrJ).

Call:
https://api.statcan.gc.ca/rdaas/classification/lQA3IRH1ER3KXwrJ

Results:

{
"@context" : [ {
"id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/id",
"name" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name",
"abbreviation" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name/~/type=abbreviation",
"audience" : "https://api.statcan.gc.ca/rdaas/model/property/path/audience",
"status" : "https://api.statcan.gc.ca/rdaas/model/property/path/lifecycle/status",
"lastUpdated" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/administrative/properties/_/administrativeLastUpdate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"description" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/description/~/type=background",
"versionName" : "https://api.statcan.gc.ca/rdaas/model/property/path/version/name",
"validFrom" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validFromDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"validTo" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validToDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"levels" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/administrative/properties/_/includes/~/inclusionRole=defines/type=level",
"@context" : [ {
"levelDepth" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/levelNumber",
"name" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name",
"codeCount" : "https://api.statcan.gc.ca/rdaas/model/property/count/codes"
}, {
"@language" : "en"
} ]
},
"codes" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/rootCode",
"@context" : [ {
"code" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/codeValue",
"validFrom" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validFromDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"validTo" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validToDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"descriptor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name",
"definition" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/description/~/type=definition",
"children" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/child"
}, {
"@language" : "en"
} ]
}
}, {
"@language" : "en"
} ],
"@id" : "https://api.statcan.gc.ca/rdaas/classification/lQA3IRH1ER3KXwrJ",
"name" : "Sex at birth",
"audience" : "Standardized",
"status" : "Released",
"lastUpdated" : "2023-03-27T11:49:44.205Z",
"description" : "<br /><p><strong>Status: </strong>This standard was approved as a <a href=\"https://www.statcan.gc.ca/eng/subjects/standard/napcs/notice/compulsory\" rel=\"nofollow\">departmental standard</a> on October 1, 2021.</p><p>Information on sex at birth may be based on self-reported data or reported by proxy depending on the statistical program. Therefore, the expression 'reported sex at birth' is used in these definitions when referring to self-reported data and answers reported by proxy.</p>",
"versionName" : "V1.0.0",
"validFrom" : "2021-10-01T00:00:00-04:00",
"levels" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/level/XV0o7C1uTxhWxZI5",
"levelDepth" : 1,
"name" : "Category",
"codeCount" : 2
} ],
"codes" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/code/vobFEu0ZvMjIOxsH",
"code" : "1",
"descriptor" : "Male",
"definition" : "This category includes persons whose reported sex at birth is male."
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/code/cFJZgJBj8RFFpFJE",
"code" : "2",
"descriptor" : "Female",
"definition" : "This category includes persons whose reported sex at birth is female."
} ]
}

Classification Categories

Returns all categories at all levels of the classification as a flat list. If the classification is hierarchical, the categories here will not be presented in their hierarchy.

Classification Categories - more information

HTTP request method: GET

Relative URL:  /classification/{id}/categories

Parameters:

Name
Description
id *
string
(path)
Identifier of the classification version.
lang
string
(query)
The internationalization language tags to request content in specific language(s). These can be comma separated. For example, "lang=en, fr-ca" See /api/i18n/tags under InternationalizationController for more details about language support.
Example: en
method
string
(query)
The internationalization serialization method to use. See /api/i18n/methods under InternationalizationController for more details about language support.
Available values: SINGLE, PROPERTY, ARRAY, CONTAINER
Example: SINGLE

Response Code:

Http Status Code Description
200 Classification categories retrieved successfully.
404 Classification not found.

Example:
This call will return the gender classification (lQA3IRH1ER3KXwrJ) categories available.

Call:
https://api.statcan.gc.ca/rdaas/classification/lQA3IRH1ER3KXwrJ/categories

Results:

{
"@context" : [ {
"code" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/codeValue",
"validFrom" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validFromDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"validTo" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validToDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"descriptor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name",
"definition" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/description/~/type=definition",
"levelDepth" : "https://api.statcan.gc.ca/rdaas/model/property/path/code/level/number"
}, {
"@language" : "en"
} ],
"@graph" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/code/vobFEu0ZvMjIOxsH",
"code" : "1",
"descriptor" : "Male",
"definition" : "This category includes persons whose reported sex at birth is male.",
"levelDepth" : 1
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/code/cFJZgJBj8RFFpFJE",
"code" : "2",
"descriptor" : "Female",
"definition" : "This category includes persons whose reported sex at birth is female.",
"levelDepth" : 1
} ]
}

Classification Categories detailed

Returns the most detailed categories of the classification. These are the categories without any sub-categories. In most hierarchical classifications, these will be the categories from the lowest level. However, in a ragged classification these categories may exist at any level.

Classification Categories detailed - more information

HTTP request method: GET

Relative URL:  /classification/{id}/categories/detailed

Parameters:

Name
Description
id *
string
(path)
Identifier of the classification.
lang
string
(query)
The internationalization language tags to request content in specific language(s). These can be comma separated. For example, "lang=en, fr-ca" See /api/i18n/tags under InternationalizationController for more details about language support.
Example: en
method
string
(query)
The internationalization serialization method to use. See /api/i18n/methods under InternationalizationController for more details about language support.
Available values: SINGLE, PROPERTY, ARRAY, CONTAINER
Example: SINGLE

Response Code:

Http Status Code Description
200 Classification categories retrieved successfully.
404 Classification not found.

Example:
This call will return the gender classification (lQA3IRH1ER3KXwrJ) categories details available.

Call:
https://api.statcan.gc.ca/rdaas/classification/lQA3IRH1ER3KXwrJ/categories/detailed

Results:

{
"@context" : [ {
"code" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/codeValue",
"validFrom" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validFromDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"validTo" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validToDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"descriptor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name",
"definition" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/description/~/type=definition",
"levelDepth" : "https://api.statcan.gc.ca/rdaas/model/property/path/code/level/number"
}, {
"@language" : "en"
} ],
"@graph" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/code/vobFEu0ZvMjIOxsH",
"code" : "1",
"descriptor" : "Male",
"definition" : "This category includes persons whose reported sex at birth is male.",
"levelDepth" : 1
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/code/cFJZgJBj8RFFpFJE",
"code" : "2",
"descriptor" : "Female",
"definition" : "This category includes persons whose reported sex at birth is female.",
"levelDepth" : 1
} ]
}

Indices

Index entries are formal definitions and further refinements of categories that fall under a particular code. These are not defined as codes in a classification, but are more informational to specify which terms are included in a particular code that is defined. They can also specify codes as explicit exclusions, indicating that they should not be considered as a value related to that code. This information can be used to help users determine what code their specific category falls under.

All Indices

Returns the details of all index entries for a classification as an array called @graph. An empty @graph array means the classification was found, but the classification does not have any indices.

All Indices - more information

HTTP request method: GET

URL: /api/classification/{id}/indexes

Parameters:

Name Mandatory Data Type Parameter Type (path/query) Description
id Yes string path Identifier of the classification.

Response Code:

Http Status Code Description
200 Index entries retrieved successfully.
404 Classification not found.

Example:
This call will return the details of all the index entries for a classification with id=owrgkARZ8Omww7qX.

Call:
https://api.statcan.gc.ca/rdaas/classification/owrgkARZ8Omww7qX/indexes

Results:

{
  "@context": [
    {
      "indexId": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/id",
      "illustrativeExamples": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=illustrativeExample",
      "inclusions": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=inclusion",
      "otherExamples": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=otherExample",
      "internalExamples": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=internal",
      "indexCode": {
        "@id": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexCode",
        "@type": "@id"
      },
      "indexCodeValue": "https://api.statcan.gc.ca/rdaas/model/property/path/index/code/code_value",
      "indexCodeDescriptor": "https://api.statcan.gc.ca/rdaas/model/property/path/index/code/descriptor",
      "exclusions": {
        "@id": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexExclusion",
        "@context": [
          {
            "code": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/codeValue",
            "descriptor": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name"
          },
          {
            "@language": "en"
          }
        ]
      }
    },
    {
      "@language": "en"
    }
  ],
  "@graph": [
    {
      "@id": "https://api.statcan.gc.ca/rdaas/indexentry/aQksihrYIqMLkgiJ",
      "indexId": 7,
      "otherExamples": [
        "hogs for breeding"
      ],
      "indexCode": "https://api.statcan.gc.ca/rdaas/code/ourZOQCA1BMbuRoF",
      "indexCodeValue": "1111211",
      "indexCodeDescriptor": "Hogs for breeding"
    },
    {
      "@id": "https://api.statcan.gc.ca/rdaas/indexentry/d1uakrz9EU9P7xMF",
      "indexId": 12,
      "otherExamples": [
        "hogs for market"
      ],
      "indexCode": "https://api.statcan.gc.ca/rdaas/code/zyDg0ZfhR0B9xm83",
      "indexCodeValue": "1111213",
      "indexCodeDescriptor": "Hogs for market"
    },
    {
      "@id": "https://api.statcan.gc.ca/rdaas/indexentry/vkRhvTgxFs3Hly0S",
      "indexId": 17,
      "otherExamples": [
        "chickens for breeding"
      ],
      "indexCode": "https://api.statcan.gc.ca/rdaas/code/xsurl5uuYmUq5c3Y",
      "indexCodeValue": "1111311",
      "indexCodeDescriptor": "Chickens for breeding"
    }
  ]
}

Single Index

Returns an index entry for a classification by its index id. If the body of the request returns 1 (instead of JSON), this could mean the classification does not have any indices.

Single Index - more information

HTTP request method: GET

URL: /api/classification/{id}/indexes

Parameters:

Name Mandatory Data Type Parameter Type (path/query) Description
id Yes string path Identifier of the classification.
indexId Yes integer path Index id of the index entry.

Response Code:

Http Status Code Description
200 Index entry retrieved successfully.
404 Classification or index entry not found.

Example:
Returns an index entry for a classification with id=owrgkARZ8Omww7qX and indexId=7.

Call:
https://api.statcan.gc.ca/rdaas/classification/owrgkARZ8Omww7qX/indexes/entry/7

Results:

{
  "@context": [
    {
      "indexId": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/id",
      "illustrativeExamples": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=illustrativeExample",
      "inclusions": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=inclusion",
      "otherExamples": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=otherExample",
      "internalExamples": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexTerm/~/type=internal",
      "indexCode": {
        "@id": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexCode",
        "@type": "@id"
      },
      "indexCodeValue": "https://api.statcan.gc.ca/rdaas/model/property/path/index/code/code_value",
      "indexCodeDescriptor": "https://api.statcan.gc.ca/rdaas/model/property/path/index/code/descriptor",
      "exclusions": {
        "@id": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/indexExclusion",
        "@context": [
          {
            "code": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/classification/properties/_/codeValue",
            "descriptor": "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name"
          },
          {
            "@language": "en"
          }
        ]
      }
    },
    {
      "@language": "en"
    }
  ],
  "@graph": [
    {
      "@id": "https://api.statcan.gc.ca/rdaas/indexentry/aQksihrYIqMLkgiJ",
      "indexId": 7,
      "otherExamples": [
        "hogs for breeding"
      ],
      "indexCode": "https://api.statcan.gc.ca/rdaas/code/ourZOQCA1BMbuRoF",
      "indexCodeValue": "1111211",
      "indexCodeDescriptor": "Hogs for breeding"
    },
    {
      "@id": "https://api.statcan.gc.ca/rdaas/indexentry/d1uakrz9EU9P7xMF",
      "indexId": 12,
      "otherExamples": [
        "hogs for market"
      ],
      "indexCode": "https://api.statcan.gc.ca/rdaas/code/zyDg0ZfhR0B9xm83",
      "indexCodeValue": "1111213",
      "indexCodeDescriptor": "Hogs for market"
    },
    {
      "@id": "https://api.statcan.gc.ca/rdaas/indexentry/vkRhvTgxFs3Hly0S",
      "indexId": 17,
      "otherExamples": [
        "chickens for breeding"
      ],
      "indexCode": "https://api.statcan.gc.ca/rdaas/code/xsurl5uuYmUq5c3Y",
      "indexCodeValue": "1111311",
      "indexCodeDescriptor": "Chickens for breeding"
    }
  ]
}

Concordances

A Concordance is a correspondence table the formally expressed the relationships between subsequent version of variants of Classifications. It can exists between two versions of the same Classification, or across two different Classifications The relationships show how the items of the target relate to the items of the source.

A Concordance holds the collection of Maps between Codes of the source and target Classification. Several different types of Maps exist, from simple one to one correspondences to complex many-to-many relationships, complex mappings primarily exists across different versions of a Classifications.

Map Types

Different types of changes or Maps exist. The following definitions describe the types of maps that are possible in Ariā:

  • Deletion (1:0) - an existing Code expires
  • Creation (0:1) - a new Code comes into existence
  • Combinations (N:1)
  • Merger: two or more Codes expires, while their denotations proceed in one emerging Code
  • Take-over: a Code expires, while its denotation proceeds as part of the denotation of another item, which continues its existence
  • Decomposition (1:N)
  • Breakdown: a Code expires, while its denotation is distributed over and proceeds in two or more emerging items
  • Split-off: an Code continues to exist, while part of its denotation moves to another (emerging) Code
  • Transfer (M:N)
  • Simple
    • Code Change (1:0, 0:1) - A Code expires, while its denotation proceeds as the denotation of en emerging item
    • Property Change (1:1) - The name or other properties of a Code changes, while its denotation remains the same
    • No Change (1:1) - No change takes place

Concordance detail.

Provides detailed information about the specified concordance. This includes descriptive information about the concordance, version information, and its code maps.

Concordance detail - more information

HTTP request method: GET

Relative URL:  /concordance/{id}

Parameters:

Name
Description
id *
string
(path)
Identifier of the concordance.
lang
string
(query)
The internationalization language tags to request content in specific language(s). These can be comma separated. For example, "lang=en, fr-ca" See /api/i18n/tags under InternationalizationController for more details about language support.
Example: en
method
string
(query)
The internationalization serialization method to use. See /api/i18n/methods under InternationalizationController for more details about language support.
Available values: SINGLE, PROPERTY, ARRAY, CONTAINER
Example: SINGLE

Response Code:

Http Status Code Description
200 Concordance retrieved successfully.
404 Concordance not found.

Example:
This call will return the concordance detail for Reference data standard on Canadian provinces and territories V2002.0.0 to V2002.0.1 (s9aLOlj8BB6DplVz).

Call:
https://api.statcan.gc.ca/rdaas/concordance/s9aLOlj8BB6DplVz

Results:

{
"@context" : [ {
"id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/id",
"name" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/name",
"audience" : "https://api.statcan.gc.ca/rdaas/model/property/path/audience",
"status" : "https://api.statcan.gc.ca/rdaas/model/property/path/lifecycle/status",
"lastUpdated" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/administrative/properties/_/administrativeLastUpdate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"versionName" : "https://api.statcan.gc.ca/rdaas/model/property/path/version/name",
"validFrom" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validFromDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"validTo" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/validToDate",
"@type" : "http://www.w3.org/2001/XMLSchema#dateTime"
},
"source" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/source",
"@type" : "@id"
},
"target" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/properties/_/target",
"@type" : "@id"
},
"versionConcordance" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/isVersionConcordance",
"predominateSource" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/isPredominateSource",
"predominateTarget" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/isPredominateTarget",
"codeMaps" : {
"@id" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/administrative/properties/_/includes/~/inclusionRole=defines/type=codeMap",
"@context" : [ {
"maptype" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/type",
"sourceCode" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/source/code",
"sourceDescriptor" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/source/name",
"sourceSinceVersion" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/source/version/since",
"targetCode" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/target/code",
"targetDescriptor" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/target/name",
"targetSinceVersion" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/target/version/since",
"distributionFactor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/distributionFactor",
"reverseDistributionFactor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/reverseDistributionFactor"
}, {
"@language" : "en"
} ]
}
}, {
"@language" : "en"
} ],
"@id" : "https://api.statcan.gc.ca/rdaas/concordance/s9aLOlj8BB6DplVz",
"name" : "Reference data standard on Canadian provinces and territories V2002.0.0 to V2002.0.1",
"audience" : "Non-standardized",
"status" : "Released",
"lastUpdated" : "2022-04-14T19:06:36.372Z",
"versionName" : "V1.0.0",
"validFrom" : "2003-04-01T00:00:00-05:00",
"source" : "https://api.statcan.gc.ca/rdaas/classification/DHE3EVBVfiUdi7n1",
"target" : "https://api.statcan.gc.ca/rdaas/classification/pElnW6jZSn77dSFd",
"versionConcordance" : true,
"codeMaps" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/RZT2TRNA8mnuideL",
"maptype" : "No Change",
"sourceCode" : "MB",
"sourceDescriptor" : "Manitoba",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "MB",
"targetDescriptor" : "Manitoba",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/poioFhrbrr6yoTv5",
"maptype" : "No Change",
"sourceCode" : "QC",
"sourceDescriptor" : "Quebec",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "QC",
"targetDescriptor" : "Quebec",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/Vhv5c0y4RoagICjZ",
"maptype" : "No Change",
"sourceCode" : "SK",
"sourceDescriptor" : "Saskatchewan",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "SK",
"targetDescriptor" : "Saskatchewan",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/DuxUs08tODCdSZKU",
"maptype" : "No Change",
"sourceCode" : "NS",
"sourceDescriptor" : "Nova Scotia",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "NS",
"targetDescriptor" : "Nova Scotia",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/lBxOFiFMqryqblaq",
"maptype" : "No Change",
"sourceCode" : "NL",
"sourceDescriptor" : "Newfoundland and Labrador",
"sourceSinceVersion" : "2002.0.0",
"targetCode" : "NL",
"targetDescriptor" : "Newfoundland and Labrador",
"targetSinceVersion" : "2002.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/R52XeJK2hJY0FvFb",
"maptype" : "No Change",
"sourceCode" : "PE",
"sourceDescriptor" : "Prince Edward Island",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "PE",
"targetDescriptor" : "Prince Edward Island",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/rczeSiD0Bi5LslOv",
"maptype" : "No Change",
"sourceCode" : "BC",
"sourceDescriptor" : "British Columbia",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "BC",
"targetDescriptor" : "British Columbia",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/xzQH64AQ3MO8yobP",
"maptype" : "No Change",
"sourceCode" : "NU",
"sourceDescriptor" : "Nunavut",
"sourceSinceVersion" : "1999.0.0",
"targetCode" : "NU",
"targetDescriptor" : "Nunavut",
"targetSinceVersion" : "1999.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/KFytLKZ437SzR11p",
"maptype" : "No Change",
"sourceCode" : "NT",
"sourceDescriptor" : "Northwest Territories",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "NT",
"targetDescriptor" : "Northwest Territories",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/CVuFLE9Nm0krCZ98",
"maptype" : "Update",
"sourceCode" : "YT",
"sourceDescriptor" : "Yukon Territory",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "YT",
"targetDescriptor" : "Yukon",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/XBApMvSyym0DFrcU",
"maptype" : "No Change",
"sourceCode" : "AB",
"sourceDescriptor" : "Alberta",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "AB",
"targetDescriptor" : "Alberta",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/NBEufKEmXO4g4Adm",
"maptype" : "No Change",
"sourceCode" : "ON",
"sourceDescriptor" : "Ontario",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "ON",
"targetDescriptor" : "Ontario",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/X8TSQiUFKEgvH4gJ",
"maptype" : "No Change",
"sourceCode" : "NB",
"sourceDescriptor" : "New Brunswick",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "NB",
"targetDescriptor" : "New Brunswick",
"targetSinceVersion" : "1990.0.0"
} ]
}

Code maps.

Returns the code maps for the specified concordance. This will not provide any additional information about the concordance. If users desire the complete detail of the concordance they can call api/concordance/{id} or if they want summary information they can call api/concordance/{id}/summary.

Code maps - more information

HTTP request method: GET

Relative URL:  /concordance/{id}/maps

Parameters:

Name
Description
id *
string
(path)
Identifier of the concordance.
lang
string
(query)
The internationalization language tags to request content in specific language(s). These can be comma separated. For example, "lang=en, fr-ca" See /api/i18n/tags under InternationalizationController for more details about language support.
Example: en
method
string
(query)
The internationalization serialization method to use. See /api/i18n/methods under InternationalizationController for more details about language support.
Available values: SINGLE, PROPERTY, ARRAY, CONTAINER
Example: SINGLE

Response Code:

Http Status Code Description
200 Concordance maps retrieved successfully.
404 Concordance not found.

Example:
This call will return the concordance detail maps for Reference data standard on Canadian provinces and territories V2002.0.0 to V2002.0.1 (s9aLOlj8BB6DplVz).

Call:
https://api.statcan.gc.ca/rdaas/concordance/s9aLOlj8BB6DplVz/maps

Results:

{
"@context" : [ {
"maptype" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/type",
"sourceCode" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/source/code",
"sourceDescriptor" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/source/name",
"sourceSinceVersion" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/source/version/since",
"targetCode" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/target/code",
"targetDescriptor" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/target/name",
"targetSinceVersion" : "https://api.statcan.gc.ca/rdaas/model/property/path/codemap/target/version/since",
"distributionFactor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/distributionFactor",
"reverseDistributionFactor" : "https://api.statcan.gc.ca/rdaas/model/property/uri/mtna.us/model/core/concordance/properties/_/reverseDistributionFactor"
}, {
"@language" : "en"
} ],
"@graph" : [ {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/RZT2TRNA8mnuideL",
"maptype" : "No Change",
"sourceCode" : "MB",
"sourceDescriptor" : "Manitoba",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "MB",
"targetDescriptor" : "Manitoba",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/poioFhrbrr6yoTv5",
"maptype" : "No Change",
"sourceCode" : "QC",
"sourceDescriptor" : "Quebec",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "QC",
"targetDescriptor" : "Quebec",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/Vhv5c0y4RoagICjZ",
"maptype" : "No Change",
"sourceCode" : "SK",
"sourceDescriptor" : "Saskatchewan",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "SK",
"targetDescriptor" : "Saskatchewan",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/DuxUs08tODCdSZKU",
"maptype" : "No Change",
"sourceCode" : "NS",
"sourceDescriptor" : "Nova Scotia",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "NS",
"targetDescriptor" : "Nova Scotia",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/lBxOFiFMqryqblaq",
"maptype" : "No Change",
"sourceCode" : "NL",
"sourceDescriptor" : "Newfoundland and Labrador",
"sourceSinceVersion" : "2002.0.0",
"targetCode" : "NL",
"targetDescriptor" : "Newfoundland and Labrador",
"targetSinceVersion" : "2002.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/R52XeJK2hJY0FvFb",
"maptype" : "No Change",
"sourceCode" : "PE",
"sourceDescriptor" : "Prince Edward Island",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "PE",
"targetDescriptor" : "Prince Edward Island",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/rczeSiD0Bi5LslOv",
"maptype" : "No Change",
"sourceCode" : "BC",
"sourceDescriptor" : "British Columbia",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "BC",
"targetDescriptor" : "British Columbia",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/xzQH64AQ3MO8yobP",
"maptype" : "No Change",
"sourceCode" : "NU",
"sourceDescriptor" : "Nunavut",
"sourceSinceVersion" : "1999.0.0",
"targetCode" : "NU",
"targetDescriptor" : "Nunavut",
"targetSinceVersion" : "1999.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/KFytLKZ437SzR11p",
"maptype" : "No Change",
"sourceCode" : "NT",
"sourceDescriptor" : "Northwest Territories",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "NT",
"targetDescriptor" : "Northwest Territories",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/CVuFLE9Nm0krCZ98",
"maptype" : "Update",
"sourceCode" : "YT",
"sourceDescriptor" : "Yukon Territory",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "YT",
"targetDescriptor" : "Yukon",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/XBApMvSyym0DFrcU",
"maptype" : "No Change",
"sourceCode" : "AB",
"sourceDescriptor" : "Alberta",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "AB",
"targetDescriptor" : "Alberta",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/NBEufKEmXO4g4Adm",
"maptype" : "No Change",
"sourceCode" : "ON",
"sourceDescriptor" : "Ontario",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "ON",
"targetDescriptor" : "Ontario",
"targetSinceVersion" : "1990.0.0"
}, {
"@id" : "https://api.statcan.gc.ca/rdaas/codemap/X8TSQiUFKEgvH4gJ",
"maptype" : "No Change",
"sourceCode" : "NB",
"sourceDescriptor" : "New Brunswick",
"sourceSinceVersion" : "1990.0.0",
"targetCode" : "NB",
"targetDescriptor" : "New Brunswick",
"targetSinceVersion" : "1990.0.0"
} ]
}

Bias Considerations in Bilingual Natural Language Processing

By: Marie-Pier Schinck, Eunbee (Andrea) Jang and Julien-Charles Lévesque, Employment and Social Development Canada

Introduction & Study objective

Employment and Social Development Canada (ESDC) has leveraged natural language processing (NLP) in multiple projects in recent years and has identified challenges around working with data that is skewed in the proportion of each official language. Recent advances in NLP are predominantly focused on the English language, and there are limited resources for non-English languages. As such, when working on applied NLP solutions for ESDC, data scientists must make decisions when processing the French language while also dealing with limited resources and competing priorities.

Concerns around treatment of the French language were initially raised by the authors of this study and are based primarily on their experience as data scientists working with bilingual datasets at ESDC (see Official Languages in Natural Language Processing). In response, the authors consulted various federal data scientists and NLP researchers and found that these challenges were not limited to ESDC and were, in fact, common across different departments and agencies.

The main objective of this project is to explore this issue and gain transferable knowledge that data scientists can use to increase the equity of solutions provided by ESDC.

As a starting point, we measure the extent of language bias in four ESDC projects where multilingual classification systems were implemented. We also experiment with rebalancing strategies to gain insight regarding an ideal representation of the minority language. We compare model performance across several scenarios, including multilingual models, separate unilingual models, or a translation-based cross-lingual approach where we translate French to English to allow unilingual training on an English-only model. Through this, we can observe how much the models' performance improves or worsens with respect to each language.

2 – Study setup

The four datasets used in this study focus on supervised classification problems of past and ongoing projects at ESDC. The scope was limited to supervised classification problems to reflect the time and resources available for this project, because of ease of access to the data and because it is the most common NLP task solved by our team.

Table 1: Dataset characteristics.
Dataset Description Number documents Proportion of French data
T4 Call summary notes written by Service Canada (SC) call center agents. These notes are generally short and incomplete sentences with administrative jargon. The goal of the project is to reduce costly human labour by automatically identifying the case where a T4 form has been returned to individuals by SC. 6 k 35 %
HR Responses of applicants to a pre-screening question in a hiring process. This research project was undertaken to assess the feasibility of using NLP to filter down the candidate pool of large-scale hiring processes. 5 k 6 %
ROE Comments written by employers on the Record of Employment (ROE) forms received by SC. ROE comments are generally short incomplete sentences with the frequent use of jargon used in employment insurance. The project is designed to reduce manual labour of SC employees by classifying ROE comments into different objectives. 280 k 28 %
PASRB News articles from Canadian media sources, obtained through the NewsDesk platform. The task is to indicate whether an article should be flagged as a relevant source to include in a brief for deputy ministers. 69 k 25 %

2.1 – Vectorization and model architecture

For our experiment, we trained models for each dataset and across several vectorization methods, model architectures and hyperparameter configurations (see Table 2). The choice of vectorization methods and models cover some of the most common tools used for NLP classification problems. The vectorization methods include a feature selection method applied to bag-of-words based on the chi-square distribution (Chi2BOW), FastText word embeddings (FT) and contextual embeddings from multilingual BERT (Devlin et al., 2018b).

For the model setup, we have two main bodies of classification architectures, which we call contextual and non-contextual learning methods. By non-contextual, we refer to learning systems that process aggregate representations of sentences, discarding information about the order of words. Those include logistic regression (LR), multi-layer perceptron (MLP) and XGBoost (XGB; Chen & Guestrin, 2016). Contextual approaches, on the other hand, take word-order information into account when learning and predicting. We implemented two model architectures that are contextual, a long and short-term memory recurrent neural network (LSTM, Hochreiter & Schmidhuber, 1997) and a popular attention-based model called BERT (Devlin et al., 2018a). The hyperparameter search details are out of scope for this article, however the main results are provided based on an exhaustive search methodology. We also evaluated the LR, MLP and XGB methods with a simple bag-of-words embedding (without chi square feature selection), though it is omitted in this article to streamline the presentation. Feel free to contact the authors to get the complete report.

Table 2: Vectorization and Model Setup
Vectorization Models
Chi square with Bag-of-Words (Chi2BOW) Logistic Regression (LR)
Chi2BOW Multi-layer Perceptron (MLP)
Chi2BOW XGBoost (XGB)
FastText (FT) LSTM
BERT (WordPiece) BERT

3 – Presence of Bias

In this study, we will investigate language bias by looking at the disparity on test accuracy between the two official languages. The presence of bias will be assessed in several different settings, discussed in more detail below.

Performance disparity in multilingual models

Our first experiment consisted of training multilingual models (i.e., training on both languages simultaneously with a single model), across the methods discussed in the previous section, using the language representation found in the original datasets (no rebalancing). To assess the presence of language bias, we then compared the performance achieved on the French portion of the data to the one achieved on the English portionFootnote 1. Figure 1 shows the test accuracy by language for the best hyperparameter configuration of each method tested.

Figure 1: Test accuracy with regards to text language

Figure 1: Test accuracy with regards to text language
Figure 1: Test accuracy with regards to text language

Performance of all the methods listed in Table 2, split by dataset and by language. Detailed numbers below.

Figure 1: Test accuracy with regards to text language
Performance of all the methods listed in Table 2, split by dataset and by language.
Dataset T4 HR ROE PASRB
Method / Language En Fr En Fr En Fr En Fr
BERT 97.6 97.2 78.4 73.2 91.7 91.2 86.3 87.0
C2BOW + LR 96.6 96.7 68.1 67.9 90.6 90.0 82.3 83.8
C2BOW + MLP 95.4 97.9 69.1 66.1 87.5 87.3 63.9 68.3
ChiSquareBOW + XGBoost 95.0 97.2 75.3 75.0 88.7 86.4 84.6 85.0
FT + LSTM 94.8 94.7 72.0 69.6 91.7 90.9 83.7 83.2

The first conclusion, when looking at Figure 1, is that there is no dominant trend that remains true across all four datasets. For instance, the ROE dataset shows that the results achieved on the English portion of the data systematically outperform those of the French portion of the data. The T4 dataset, on the other hand, shows the opposite trend, with the French portion of the data outperforming the English portion across most methods. This can be explained by the fact that the T4 dataset contains the highest proportion of French data and by the content of the data itself, where the business context supports the hypothesis of a different underlying distribution for each language, leading to the classification problem being easier to solve in French than in English. The PASRB and HR datasets display less clear trends, with French outperforming English slightly for PASRB and the opposite for HR.

To get a more detailed picture, we compiled the differences in performance of those experiments and normalised them by calculating their z-scores (higher scores indicating better relative performance on English). This revealed that, on average, the multilingual models trained for this study performed slightly better on English than on French, by a factor of 0.13 standard deviation on the performance metric. Trends on individual datasets are slightly stronger, with an average difference of 0.56 and 0.41 standard deviations respectively on ROE and HR, and –0.33 on T4. Despite this slight overall bias in favour of English, the main takeaway of these results is the importance of a language specific and careful performance analysis when using multilingual models because the presence of bias will vary based on dataset properties and the business context behind data collection.

Influence of the language distribution in multilingual systems

In this section, we explore the impact of language proportion (i.e. the ratio of French to English data) in multilingual systems. We evaluate with two methods that are commonly used for NLP classification tasks and which performed satisfyingly on our benchmarks: BOW+XGBoost and BERT. We evaluate on the ROE dataset due to its larger size.

In the experiment, undersampling is applied to one of the languages to obtain a target ratio of French to English data ranging from 10:90 to 90:10. The testing data is kept intact with a 28:72 French to English ratio, in order to evaluate on the same samples every time.

Figure 2: Language ratio experiment on ROE data. Left: Bag-of-Words with an XGBoost classifier. Right: BERT model averaged across 3 repetitions per ratio.

Figure 2: Language ratio experiment on ROE data. Left: Bag-of-Words with an XGBoost classifier. Right: BERT model averaged across 3 repetitions per ratio
Figure 2: Language ratio experiment on ROE data. Left: Bag-of-Words with an XGBoost classifier. Right: BERT model averaged across 3 repetitions per ratio. The two graphs above show the result of the language ratio experiments on the ROE dataset. On the left is the performance for the XGBoost model and on the right is the performance for the BERT model averaged across 3 repetitions per ratio. The x-axis of the two graphs shows the proportion of data, a series of French to English ratios. It starts from a 10:90 French to English ratio all the way to a 90:10 ratio. The y-axis is the accuracy score denoted in percentage. The gray dashed line represents the overall accuracy score of each model, and the solid coloured lines show the performance of each language separately; red represents the French portion of the data, and blue represents the English portion of data. According to these figures, increasing the proportion of one language for model training will result in an improved performance on that language, with the opposite trend also being observed. Furthermore, the graphs show that the ratio where the French and English have the least performance disparity is different from the original ratio (28:72) of the dataset. That is a 50:50 ratio for the XGBoost model (on the left) and a 40:60 ratio for the BERT model.

Figure 2 illustrates the performance of these two models with the data ratio splits described above. As expected, the experiment shows that decreasing the proportion of data in a given language tends to reduce the performance on that language in all cases, with the opposite trend observed when increasing the proportion of a language, although sometimes the performance stays stable for different ratios. The overall accuracy curve systematically lies closer to the English accuracy curve because it is calculated on a test set using the fixed language ratio (fr:en) found in the original dataset (28:72).

The experiments also show that the optimal language ratios vary based on the different learning methods. More specifically, on BOW + XGBoost, French and English accuracy scores have the lowest discrepancy at a 50:50 ratio (fr:en). With BERT, the two languages have the lowest discrepancy in accuracy at the 30:70 and 40:60 ratios. This is especially interesting given that the optimal ratio is different than the original ratio in the dataset, in this case 28:72.

This experiment indicates that artificially manipulating language proportion may intensify or improve bias. It is advisable to have a somewhat balanced language proportion at training to reduce the disparity between the performance on the two languages. However, there is a trade-off to make between the overall accuracy (accuracy on all samples), and the accuracy on French versus English texts – the point of optimal performance might not be the same for both criteria.

Trade-off between multilingual and unilingual modelling

In this section, we present an analysis of the performance disparity for each language when training one model on both languages, known as the multilingual setting (multi), and when training two models (one per language), known as the unilingual setting (uni). The results are presented separately for each language, with the French language section also including the performance of an English-based unilingual system trained on French data translated to English (trans_uni). It should be noted that the model was trained only on the translated French data, rather than on the whole dataset including data originally in English and translated French data, mostly due to constraints in computational resources. This experiment aims to understand to what extent the signal needed for classification remains intact after documents have been translated. For translation, we use the Marian neural machine translation model.Footnote 2

English

The bar plot shown in Figure 3 displays the best performance for each method on the English portion of the data, with two bars for each, the left bar showing the multilingual setting and the right bar showing the unilingual setting.

In terms of model architecture, it can be seen that the BERT model performs the best across all datasets. Contrastingly, Multi-Layer Perceptron with the Chi Square BOW vectorization method (C2BOW + MLP) is one of the worst performing configurations across all datasets.

Figure 3: Comparison of English performance (test accuracy) in two different settings – multilingual and unilingual.

Figure 3: Comparison of English performance (test accuracy) in two different settings – multilingual and unilingual
Figure 3: Comparison of English performance (test accuracy) in two different settings – multilingual and unilingual.

Performance on English texts for all the methods listed in Table 2 trained in a unilingual versus multilingual setting, split by dataset. Detailed numbers below.

Figure 3: Comparison of English performance (test accuracy) in two different settings – multilingual and unilingual.
Performance on English texts for all the methods listed in Table 2 trained in a unilingual versus multilingual setting, split by dataset.
Dataset T4 HR ROE PASRB
Method / Mode Multi Uni Multi Uni Multi Uni Multi Uni
BERT 97.57 97.35 78.40 77.84 91.66 91.83 86.29 84.47
C2BOW + LR 96.63 96.56 68.14 72.87 90.56 90.58 82.34 82.12
C2BOW + MLP 95.41 95.90 69.09 69.79 87.52 87.95 63.92 68.07
C2BOW + XGB 95.01 96.69 75.30 77.61 88.70 89.30 84.65 84.83
FT + LSTM 94.82 93.65 71.97 77.37 91.69 91.37 83.70 83.30

For the datasets T4, HR, and PASRB, BERT's performance is higher in the multilingual setting, while the unilingual setting slightly outperforms the multilingual one for the ROE dataset, which interestingly, is the largest dataset. While multilingual training appears to perform better for BERT-based models, multilingual methods overall have slightly lower performance compared to their unilingual counterparts, on average 0.56% lower. We take this as evidence that the method of training (multilingual vs unilingual) does not significantly impact the dominant language category, English.

French

Figure 4 presents the overall comparison of the three approaches in French modelling. More specifically, we compare the performance of the French portion of the data in the multilingual setting (multi) against unilingual approaches, one with models trained on the original French data (uni) and another with the French data translated into English to be fed into an English unilingual system (trans_uni). For trans_uni, we only use the French portion of the data for training, leaving out the original English data, to directly observe the impact of translation approach on the minority language.

Figure 4: Comparison of French performance on three approaches – multilingual, unilingual, translated unilingual.

Figure 4: Comparison of French performance on three approaches – multilingual, unilingual, translated unilingual
Figure 4: Comparison of French performance on three approaches – multilingual, unilingual, translated unilingual.

Performance on French texts for all the methods listed in Table 2 trained in a unilingual versus multilingual setting, split by dataset. Detailed numbers below.

Figure 4: Comparison of French performance on three approaches – multilingual, unilingual, translated unilingual.
Performance on French texts for all the methods listed in Table 2 trained in a unilingual versus multilingual setting, split by dataset.
Dataset T4 HR ROE PASRB
Method / Mode multi trans_uni uni multi trans_uni uni multi trans_uni uni multi trans_uni uni
BERT 97.18 96.34 96.34 73.21 88.00 90.00 91.23 90.99 92.45 87.05 81.63 86.10
C2BOW + LR 96.71 96.59 96.83 67.86 86.00 86.00 90.02 89.39 90.36 83.79 80.89 83.85
C2BOW + MLP 97.89 94.39 96.10 66.07 58.00 64.00 87.30 86.69 88.20 68.32 68.62 81.08
C2BOW + XGB 97.18 95.61 96.59 75.00 76.00 84.00 86.42 88.71 89.38 84.97 82.63 86.47
FT + LSTM 94.68 95.37 94.63 69.57 60.00 68.00 90.88 90.16 90.92 83.21 78.30 83.36

With regards to model architecture, similar to what was observed for English, the best methods tend to vary depending on the dataset, with BERT performing best overall. BERT is the best method for HR, ROE and PASRB, with the unilingual setting outperforming both other settings in the first two cases (HR, ROE) and the multilingual setting preforming best for PASRB. Interestingly, the Chi2BOW + MLP method outperforms all other methods for the T4 datasets while it shows the worst results across the other three datasets.

As for the training schemes, we first notice that trans_uni appears to be the worst performing setting in general on T4, ROE, and PASRB. On HR, trans_uni is not always the worst approach but it is also not the best model. It appears that the errors from two cascaded models, the neural machine translation model used and the main classifier, are potentially being propagated when they are used one after the other. This provides evidence that translation may not be an ideal option for mitigating the data imbalance issue. However, it should be noted that the scope of our experiment on translation-based approach is limited to the French portion of data and the result may vary if the full data (English + French translated into English) is used on an English unilingual system.

French unilingual models seem to outperform their multilingual counterparts for three out of four datasets, HR, ROE and PARSB. For the T4 dataset, we observe that the multilingual models outperform their unilingual version for the majority of methods. Finally, when assessing the differences in accuracy scores for the French portion of the data between the unilingual and multilingual models, we see that unilingual models on average outperform multilingual models by 2.22 percentage points of accuracy. This difference is a lot more pronounced than what was observed for the English language. This indicates that the choice of using a multilingual model, as opposed to two unilingual ones, will on average lead to a greater decline in performance on the French portion of the data, compared to the English portion.

4 – Conclusion

Making decisions regarding the handling of bilingual text data is commonplace for many data scientists working as federal public servants. While the status of official languages prescribes that there should be no difference in treatment between them, this can be particularly difficult to achieve when a greater quantity and quality of NLP tools are available for English than for French. This initiative aimed to gain applied and transferable knowledge to help the Government of Canada's (GoC) data scientists make more informed decisions when developing NLP solutions for bilingual datasets.

Our results first indicated that there is no trend that remains true across datasets when looking at bias in multilingual models. For instance, the ROE dataset showed a slight bias where performance on English comments is systematically higher than on French comments, whereas analysis on the T4 data revealed an opposite trend with bias favouring French. In short, although there is no definitive rule for bias emerging in multilingual models across all datasets, some models do have a tendency to underperform on one of the official languages, highlighting the need for proper language-specific assessment to avoid risks of biased treatment or disparate impact. The experiments on language proportions in the multilingual setting showed that aiming for a 30-50% representation of French through undersampling of the majority language leads to the best results. More specifically, it allows for a decreased disparity in performance between both official languages, without harming overall performance.

The exploration of the multilingual setting compared to the unilingual setting revealed that the impact on the performance of the English portion of the data was negligible, with both settings leading to similar results, although performance was slightly improved in the unilingual setting. On the other hand, the French portion of the data sees a more significant decrease in performance in the multilingual setting, compared to the unilingual one. This means that, when good quality language identification is available, data science practitioners across the GoC should seriously consider the use of two unilingual models as it tends to result in better performance on average, when compared to a single multilingual model. Finally, translating French to use a unilingual English model showed the least promise of the three settings across all datasets. Since it carries greater risk of bias on the minority language in our experiment, we recommend conducting a complete analysis of its impact when attempting to deploy a single unilingual model with the minority language translated.

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about this article or would like to discuss this further, we invite you to our new Meet the Data Scientist presentation series where the author(s) will be presenting this topic to DSN readers and members.

Tuesday, June 21
2:00 to 3:00 p.m. EDT
MS Teams – link will be provided to the registrants by email

Register for the Data Science Network's Meet the Data Scientist Presentation. We hope to see you there!

References

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKIDD international conference on knowledge discovery and data mining (pp. 785-794).

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018a). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018b). Multilingual BERT. GitHub: google-research / bert

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

Date modified: