Developments in Machine Learning Series: Issue two

By: Nicholas Denis, Statistics Canada

Editor's Note: This series showcases new and interesting research developments in machine learning (ML) from around the world. Hopefully you can find something that will help you in your own or your colleagues' work.

This month's topics:

Generating realistic image from user-text input

Figure 1: Four images generated from user-text input
Figure 1: Four images generated from user-text input.

Four images side by side as examples of realistic researcher-generated images.

  • The first image is of a black cat touching round pegs on a checker board. The caption reads "a surrealist dream-like oil painting by Salvador Dali of a cat playing checkers".
  • The second image is of a sun setting behind desert canyons. Caption: "a professional photo of a sunset behind the grand canyon".
  • The third image is a portrait painting of a green and blue hamster with dragon wings on a red background. Caption: "a high-quality oil painting of a psychedelic hamster dragon".
  • The fourth image is a drawing of Albert Einstein in a Superman costume. Caption: "an illustration of Albert Einstein wearing a superhero costume".

Researchers generate photo realistic images from user text input using a 3.5 billion parameter model.

What's new?: OpenAI has leveraged recent success from their popular CLIP (Contrastive Language-Image Pre-training) model to train a Gaussian diffusion model to generate realistic and nuanced images conditioned solely on a text input describing the image to be generated. The model, GLIDE (Guided Language to Image Diffusion for generation and Editing), can be accessed via Google CoLab.

How it works: When given a text input and an initially randomly-sampled vector of noise, x0, the model is able to sequentially de-noise the sample at stage t (xt), conditioned on the text input and the de-noised sample at the previous stage (xt-1). The final de-noised image, xT, is the final generated image which attempts to capture the semantics of the user-provided text input.

Why does it work?: Gaussian diffusion models are a noise-additive process (see Denoising Diffusion Implicit Models) that begin with an image and produce a Markov chain of increasingly noisy images, where the image at time t,

q(xt|xt-1)=(xt; αtxt-1,(1αt)I)

which is to say, the distribution of next images conditional on the previous image, is normally distributed, where αt is a noise parameter. The end result is a fully random image. Under mild conditions, the posterior q(xt1| xt) is well defined, and can be approximated with deep neural networks. Briefly,

  • the reverse process represents a manner of sequentially removing noise from an image, to arrive at natural photorealistic image,
  • by starting with a naturally occurring image and adding Gaussian noise to it, they are able to train a model to estimate the noise added, and
  • the authors utilize other techniques and tricks from guided diffusion using text semantics from a CLIP model.

Results: Quantitative evaluation of generative models is difficult and an open problem. However, the research does include some state-of-the-art zero-shot performance metrics. Qualitatively, the model is capable of producing incredibly nuanced and specific images such as "a crayon drawing of a space elevator" and "a stained glass window of a panda eating bamboo". Moreover, the authors had human evaluators examine the images generated by GLIDE and other state-of-the-art generative models. Humans judged images produced by GLIDE as more photorealistic than other models between 66% to 91% of the time.

But…: It's ubiquitous in publications for generative models for the authors to cherry-pick the generated data to present, however it is also quite common for authors to also include a large enough sample of randomly selected instances, as well. This paper could have included a much larger gallery of randomly selected images to share. Also, the model is 3.5 billion parameters and requires a significant amount of time (20 seconds) to generate a single image, making this approach unlikely to scale.

Our opinion: Generative models are becoming increasingly powerful, producing high quality and seemingly authentic images that fool humans – and the quality will only increase over time. See if you can tell which face is real. As techniques such as GLIDE take specific input and direction from humans to produce high quality images (and soon videos), legal, ethical and evidential issues will need to be addressed immediately.

Move over principal component analysts, make way for learning dimensionality reduction

Figure 2: Higher and lower-dimensional representation
Figure 2: Higher and lower-dimensional representation

Given a set of feature vectors in a generic input space, we use nearest neighbours to define a set of feature pairs whose proximity we want to preserve. We then learn a dimensionality-reduction function (the encoder) by encouraging neighbours in the input space to have similar representations. We learn it jointly with auxiliary projectors that produce high dimensional representations, where we compute the Barlow Twins loss over the (d' × d') cross-correlation matrix averaged over the batch. (source: Twin Learning for Dimensionality Reduction)

TLDR (Twin Learning for Dimensionality Reduction) beats principal component analysis (PCA) for small to mid-size outputs (8 to 128 dimensions).

What's new?: Naver Labs released TLDR, a general technique that uses a linear dimensionality reduction encoder that encourages the nearest neighbours in the input space to be similar in the smaller embedding space.

How it works:

  • During training, a given data-instance in some high-dimensional input space is sampled and its k-nearest neighbours are computed.
  • A linear-embedding matrix maps the data instance and its neighbours to the lower-dimensional space through the embedding matrix.
  • The lower-dimensional embeddings are projected (via a projector network) to a much higher dimensional space where their cross-correlation matrix is computed.
  • They used the recently-proposed Barlow Twins loss, which encourages the cross correlation matrix to be the identity matrix.
  • After training, the projector network is discarded and the linear encoder is used for dimensionality reduction.

Why it works:

  • The Barlow Twins loss is minimized when the cross-correlation matrix between vectors is the identity matrix. This is equivalent to a factorized representation, where each dimension of the data is completely independent.
  • By minimizing the cross-correlation across dimensions, redundant information encoded is minimized, which is a desirable property for dimensionality reduction.
  • By computing this loss on pairs of inputs that are close neighbours in the input space, the linear embedding function is learning something similar to Manifold Learning, where locally nearby points are invariant (similar) in the lower-dimensional embedding space.

Results: The authors focus on retrieval tasks. Given an input image or text document, the retrieval task aims to find the most similar instance(s) to the input within a given dataset. Note that for images, TLDR was applied to outputs produced from a pre-trained vision model, and for text, TLDR was applied to outputs produced from a pre-trained BERT language model.

  • On image retrieval datasets, TLDR improved over PCA in terms of mean average precision by 6 to 10%, across different output dimensionalities.
  • On test retrieval tasks, TLDR improved over PCA for recall scores by as much as 27%, and saw dramatic improvement as the size of the output dimension decreased.
  • Compared to other leading dimensionality reduction techniques, including manifold based techniques, TLDR consistently outperformed all other approaches at output dimension sizes eight and higher, however underperformed for dimension sizes two and four.

Our opinion: Dimensionality reduction techniques are typically discussed for tabular datasets and the use of classical machine learning techniques, but often fail to be useful for extremely high-dimensional data, such as images and text. TLDR is a linear dimensionality reduction technique that can be applied to tabular or complex data. This technique could be useful for:

  • retrieval tasks,
  • label selection strategies in active learning,
  • cluster and data exploration, and
  • explainable machine learning models

Help! Machines are learning to code!

Google DeepMind introduces AlphaCode, which performs above the 50th percentile in coding competition.

What's new: DeepMind built a new competitive programming dataset used to train a 41 billion parameter model that can take a natural language text description of a coding challenge (see figure above) and produce functional code to solve the challenge. This is bananas!

How it works: Coding competitions are quite common. DeepMind built multiple datasets to iteratively train a deep neural network that can take natural language text input describing a programming challenge, as seen above, and produce an output, character by character, of code. They did this first by sampling a large number of potential solutions, followed by a filtration and clustering process to remove weak candidate solutions, finally 10 diverse solutions were submitted to a real world competition. Below we breakdown some relevant steps:

  • They built a pre-training dataset based on a public snapshot of GitHub repositories using code from 12 popular programming languages, resulting in 715Gb of data.
  • They use a 41 billion parameter transformers model with an encoder and decoder architecture, which, given a sequence of text will output a sequence of text.
  • Pre-training the model involves predicting the next token (character/word in the code) conditioned on the input and output produced thus far
  • A standard masked language model loss (as in BERT) was used, where given a sequence of inputs (natural language text description of the problem), 15% of the text is randomly erased and the model must infer what the missing text is.
  • The pre-trained model is fine-tuned on their competitive programming dataset. Similar training objectives to the pre-training are used here.
  • Their model is able to sample millions of possible solutions for each input challenge.
  • Reinforcement Learning was used to increase the diversity of solutions sampled from the model
  • Since only 10 solutions can be submitted per challenge, the authors filter their millions of potential solutions through a number of tests, provided in each challenge, then cluster their solutions based on program behaviour. One solution from the 10 largest clusters were sampled and submitted as the solution to the challenge.

Results: The model was applied to 10 separate coding challenges.

  • The authors found that when they sample 1 million possible solutions and submit the (estimated) best 10, the code successfully solves the coding challenge over 30% of the time. Moreover, success rate scales positively with respect to the number of sample solutions produced as well as the model size. Moreover, performance vs time the model is trained for scales linearly and no plateauing was observed, signifying if they trained longer, the model would do better.
  • The authors estimate that the model, on average, performed in the top 54th percentile of the coding problems, ranking just above the median programmer.

What's the context?: Though this task is incredibly difficult and it is simply astounding that a model can solve even a single task, these results should not be a surprise. Github's Copilot can automatically suggest lines or even full functions of code. Copilot is built by OpenAI, who does plenty of work similar to this paper. Anyone can use this service, today.

Just as some fear that data science and machine learning may bring job automation, automatically generated code and AutoML might bring fear to data scientists themselves of being automated away!

Our opinion: It seemed like just yesterday that the world marvelled at the ability of ML models to infer objects such as cats and dogs in a given image. Today, we are witnessing the continued advancement of ML models that can extract and abstract incredibly complex semantic content from long blocks of text describing complicated tasks, and then output a long and specifically structured syntax that is functional programming code solving the given task.

Though the results are promising and still have room for improvement, this field is in its infancy, and you can expect they certainly will improve. This brings ethical concerns into play on many different levels. On one level, what is the responsibility of a developer, who uses this technology to write code that they themselves are unable to do themselves? Can such a developer be able to debug or ascertain that the code does, in fact, do what it is intended to do?

Figure 3: Natural language text automatically generated from code snippet

Backspace
You are given two strings s and t, both consisting of lowercase English letters. You are going to type the string s character by character, from the first character to the last one.
When typing a character, instead of pressing the button corresponding to it, you can press the "Backspace" button. It deletes the last character you have typed among those that aren't deleted yet (or does nothing if there no characters in the current string). For example, if s is "abcbd" and you press Backspace instead of typing the first and the fourth characters, you will get the string "bd" (the first press of Backspace deletes no character, and the second press deletes the character 'c'). Another example, if s is "abcaa" and you press Backspace instead of the last two letters, then the resulting text is "a".
Your task is to determine whether you can obstain the string t, if you type the string s and press "Backspace" instead of typing several (maybe zero) characters of s.

Input
The first line contains a single integer q (1  q  105 ) the number of test cases. The first line of each test case contains the string s (1  |s|  105 ). Each character of s is a lowercase English letter.
The second line of each test case contains the string t (1  |t|  105 ). Each character of t is a lowercase English letter.
It is guaranteed that the total number of characters in the strings over all test cases does not exceed 2. 105.

Output
For each test case, print "YES" if you can obtain the string t by typing the string s and replacing some characters with presses of "Backspace" button, or "NO" if you cannot.
You may print each letter in any case (YES, yes, Yes will all be recognized as positive answer, NO, no, nO will all be recognized as negative answer).


1	1=int(input( ) )
2	for i in range(t):
3		s=input( )
4		t=input( )
5		a=[ ]
6		b=[ ]
7		for j in s:
8			a.append(j)
9		for j in t:
10			b.append(j)
11		a.reverse( )
12		b.reverse( )
13		c=[ ]
14		while len(b) !=0 and len(a) !=0
15			if a [0]==b[0]:
16				c.append(b.pop(0))
17				a.pop(0)
18		elif a[0] !=b[0] and len (a) !=1:
19			a.pop(0)
20			a.pop(0)
21		elif a[0] !=b[0] and len(a) ==1:
22			a.pop(0)
23		if len (b) ==0:
24			print ("YES")
25		else:
26			print ("NO")
  
Date modified:

Data Access Division newsletter - Spring 2022 edition

PDF Version (PDF, 311.02 KB)

A message to our staff and clients

With the arrival of spring and warmer weather, the Data Access Division (DAD) would like to take a moment to thank its staff for continuing to show hard work and dedication, as the effort continues to collectively lead the division and its programs towards a path of success. We would also like to extend our gratitude to all our clients and partners for their continued support and trust. We remain devoted to continuing our work in providing our researchers and clients real-time access to data and services in order to best serve data needs of all Canadians.

Celebrating accomplishments and focus for the upcoming year

DAD would like to highlight and celebrate some of its greatest accomplishments within the last few months. The Self-Serve Access (SSA) section received approval for each institution to have two Real Time Remote Access (RTRA) accounts, and has completed its part in this Public Use Microdata Files (PUMFs) online project initiative. In collaboration with the Research Data Centre (RDC), the Canadian Research Data Centre Network (CRDCN) hosted another successful conference with over 400 registrants for the 2021 conference. The Virtual Data Lab (vDL) officially launched into production as of October 2021. This was a significant achievement and will allow Statistics Canada to be better positioned to advance its user-centricity by introducing this new mode of access as it will enhance StatCan's existing access methods and will expand microdata offerings to accredited researchers.

For the upcoming year, DAD will continue to focus on collaboration efforts with various teams and partners, and on leveraging new technologies to help drive Statistics Canada's modernizations efforts. Efforts will continue on developing the Virtual Research Data Centre (vRDC), increasing business data holdings in the RDCs, continuing to migrate existing researchers and strive towards onboarding new researchers into the vDL, and ramping up engagement with existing and new stakeholders in response to the Division's new data access and marketing plan rollout.

For more information, please visit the Data Access Division website.

Self-serve access

Data Liberation Initiative Team Updates

We are very pleased and excited to announce that StatCan has selected Rich Data Services (RDS) platform to replace its Nesstar server for the delivery of data and metadata to its research communities. This initiative will further support the agency's modernization efforts.

Various enhancements and new features will be introduced in RDS to further strengthen the platform and meet the agency's and users' needs, including multilingual support for metadata and the RDS Explorer/TabEngine user interfaces, bi-variate regression analysis, and compliance with 508 / WCAG accessibility requirements.

RDS will also integrate with other systems, such as Colectica and MTNA's Aria platform, which StatCan is readily using for managing its classifications. RDS will provide technical assistance for the migration of existing datasets from Nesstar.

Note that Nesstar has been decommissioned based on advice from Canadian Centre for Cyber Security and Statistics Canada IT security team due to a cyber security vulnerability. Nesstar was unable to offer a patch that met Statistics Canada requirements. In the meantime, we are happy to help with any requests related to Nesstar tabulations or downloads.

Program review

Intergage undertook extensive consultations with members in October and November 2021.

This included interview consultations with approximately 25 advisory body members, subject matter experts and StatCan staff regarding the DLI program including what's working, what's not working, and opportunities for improvement and potential risks.

The findings were presented by Jennifer Smith from Intergage at the National Training on November 22, 2021. Intergage is now working on a report and plan in collaboration with a working group which includes members of the DLI Executive Committee

Public use microdata files online project

The Self-serve Access (SSA) section has completed its part in this initiative. The StatCan Dissemination team is now working on putting all older PUMFs online in a downloadable format. Newly released PUMFs are being added to the website as they become available. Digital Object Identifiers (DOIs) are being assigned to the PUMFs as they are made available.

Custom tabulations

Statistics Canada (StatCan) is offering a limited number of free custom tabulations for Data Liberation Initiative (DLI) members courtesy of the Data Service Centres. The initiative is aimed at students working on research projects who may not have the funds to request custom tabulations. Completed custom tabulations will be returned to the requesting librarian, the researcher and posted to the Electronic File Transfer (EFT). Expected turnaround time for custom tabulations will depend on the nature of the request but in general should take between two weeks and two months.

Statistics Canada (StatCan) is offering a limited number of free custom tabulations for Data Liberation Initiative (DLI) members courtesy of the Data Service Centres. The initiative is aimed at students working on research projects who may not have the funds to request custom tabulations. Completed custom tabulations will be returned to the requesting librarian, the researcher and posted to the Electronic File Transfer (EFT). Expected turnaround time for custom tabulations will depend on the nature of the request but in general should take between two weeks and two months.

We ask that you submit the details of the custom tabulation request to the DLI StatCan team: statcan.maddlidamidd.statcan@statcan.gc.ca

DLI executive committee

The DLI Executive Committee has the pleasure to announce the following representatives:

  • Co-chair: Siobhan Hanratty from University of New Brunswick
  • Co-chair: Elizabeth Hill from Western University.
  • Western Region: Sarah Rutley from the University of Saskatchewan.

The Atlantic seat still remains vacant.

Professional development committee

The Professional Development Committee (PDC) sent a call-out to the Listserv in March 2021 for a volunteer to represent the Quebec region. This seat still remains vacant.

PDC initiatives:

  • DLI Training Repository – a subcommittee of the PDC is working on transitioning from CUDO to Scholars Portal Dataverse
  • Training – discussions have started on what training may look like in 2022

Statistics Canada training

Statistics Canada (StatCan) provides training for all levels of data users using different platforms, as well as other data services such as customized products. The training is provided by the Data Service Centres. Over the past year, they have expanded many of their offerings. See below for the most up-to-date information:

  • Workshop series - Our Workshop Series provides you with direct access to Statistics Canada's extensive survey methodology and analysis experience.
  • Webinars - The Webinar series covers a broad range of topics from the Census program to navigating the Statistics Canada website.
  • Data literacy - The training is aimed at those who are new to data or those who have some experience with data but may need a refresher or want to expand their knowledge. We invite you to check out our Learning catalogue to learn more about our offerings including a great collection of short videos. Be sure to check back regularly as we will be continuing to release new training
  • Statistics: Power from Data!Updated on September 2, 2021, this training tool for students, teachers and the general population will help in getting the most from statistics. This resource aims to help readers:
    • Gain confidence in using statistical information
    • Appreciate the importance of statistical information in today's society
    • Make critical use of information that is presented to them

These goals are at the heart of Statistics Canada's mission to assist Canadians with informed decision-making based on data.

A list of all DLI products is available on the website: Data Liberation Initiative

Data releases to DLI January-March 2022:

  • Social Policy Simulation Database and Model (SPSD/M) – version 29.0
  • National Travel Survey (NTS) 2020
  • Canadian Income Survey (CIS) 2018
  • Canadian Income Survey (CIS) 2019
  • General Social Survey (GSS) Cycle 34
  • General Social Survey (GSS) Cycle 35
  • Labour Force Survey (LFS) - monthly

Real time remote access updates

StatCan will continue to offer DLI members one free Real Time Remote Access (RTRA) account ($5,000 value/per institution). RTRA is an online, real-time tool to create custom tabulations. RTRA users can calculate frequencies, means, percentiles, percent distribution, proportions, ratios, and shares on social and administrative data.

RDC researchers have had their access extended to March 31, 2023.

43 new users subscribed between July and December 2021.

SAS assistant

The SSA section is continuing its work on adding more surveys and creating pre-recorded webinars.

A list of all RTRA products is available on the website: Real Time Remote Access

Research data centres

Research data centres update

In order to ensure the safety of Statistics Canada staff and researchers, RDCs continue to operate with COVID-19 restrictions.

Work on the vRDC is ongoing with the preparation for the build and rollout. While there have been procurement challenges, equipment has started to arrive and the remaining equipment is on its way. The team has been focused on establishing a work plan and timeline for the rollout of the project to each of the RDCs. This has included a survey to all Academic Directors to gather feedback on timing and preparations.

The business working group of the vRDC Project has also been hard at work. Discussions are well underway to establish a Memorandum of Understanding (MOU) that will allow both physical access to Statistics Canada data in an RDC as well as outside an RDC in another authorized workspace. Work also progresses on the letters of agreements and invoicing that will occur in the vRDC environment and determining the necessary risk sharing agreements.

New research data centres holdings

A total of thirty products were added to our data holdings in the third quarter of the 2021/2022 fiscal year. These include one new administrative data file, five new surveys, twelve new integrated data files, and updates to twelve data files.

Highlights of data files added from October to December 2021

  • Survey on Sexual Misconduct at Work (SSMW) 2020
  • Canadian Food Environment Database (Can-FED) 2018
  • Education and Labour Market Longitudinal Platform (ELMLP): Canada Education Savings Program (CESP) data linked to T1FF, Census of Population Keys 2016, and Longitudinal Administrative Databank
  • General Social Survey - Family (GSS) Linked to T1FF
  • Canadian Social Survey – Well-being, Activities and Perception of Time 2021 (CSS-WAPT)

For a complete list of data available in RDCs and government access centres, visit: Data available at the Research Data Centres

Access to business data

Access to business data in RDCs continues to progress. This winter the local RDC analysts have begun vetting output for business data projects. We thank our colleagues from the Government Access Program (FRDC) who have been supporting the RDCs during this transition. Our RDC and FRDC staff on the Vetting Committee created resources to support researchers working with business data in the centres. These include a detailed business data handbook, a revised vetting orientation and a macro for researchers to use to run the confidentiality tests.

New training initiatives

A series of training videos have been produced by our Research Data Centre staff to help support researchers who are preparing their analytical output for release. These videos are part of the Confidentiality Vetting Support Series and present examples of how to use different statistical software packages to perform the analyses required for researchers working with confidential microdata.

  • Confidentiality Vetting support: Dominance and homogeneity using R
  • Confidentiality Vetting Support: Proportion and Round Tool using SAS
  • Confidentiality Vetting Support: Rounding Proportions using Stata
  • Confidentiality Vetting Support: Dominance and Homogeneity using SAS
  • Confidentiality Vetting Support: Dominance and Homogeneity using the tcensus function (Stata)
  • Confidentiality Vetting Support: Rounding proportions using Rounder – An R Shiny App
  • Confidentiality Vetting Support: Histogram Output in Stata

These short 5 to 10 minute videos are available in English and French and will soon be available on the Statistics Canada website under training and events: Training and events

Virtual data lab

The Virtual Data Lab (vDL) has launched for select federal and provincial clients in the FRDC program! The vDL allows for the use of "authorized workspaces" where qualifying microdata projects may be accessed outside of the FRDC physical location. The vDL improves users experience of microdata access while maintaining secure disclosure control. Researchers will be able to access many StatCan household and business surveys and administrative data holdings in this new virtual environment.

The vDL transition is occurring over three waves, with existing partners with eligible projects being transitioned in waves 1 and 2. StatCan is working with federal departments, provincial ministries and non-government organizations to transition all eligible projects into the new technology. All existing eligible projects will be transitioned by the end of summer 2022.

To be eligible for the vDL, each department, ministry or organization will require an MOU, Section 10 or organizational agreement and accreditation for their respective organization and researchers. Projects will also need a microdata research contract in place and the project data sets must have a Confidentially Classification Tool (CCT) score of 7 or less.

Once a department has been onboarded into the vDL, new eligible projects will also be able to be accessed in the vDL.

We will be reaching out to federal departments and provincial ministries over the next few months to discuss their transitions to the vDL.

Questions or comments? Visit Access to microdata.

Check out the StatCan Blog.

Don't forget to follow us on social media!

Date modified:

Frequently asked questions on random tabular adjustment (RTA)

Why are some estimates missing from current or previously published Census of Agriculture tables?

Estimates may be missing from Census of Agriculture data tables for one of the following two reasons:

  • Data suppression: One of the methods used to protect the information of individual members of the population is data suppression. This method was used for the 2016 Census of Agriculture. However, in 2021, data suppression was replaced with a new method called random tabular adjustment (RTA).
  • Data quality: For the first time, the 2021 Census of Agriculture will publish a quality indicator for most of its estimates to account for the degree of uncertainty because of non-response, data processing and RTA in individual estimates. This indicator takes the form of a letter between A and F, where A-level estimates are considered to be the most reliable, and F-level estimates are considered to have so much uncertainty that they are too unreliable to be published. When an estimate cannot be published, the cell in the table will simply show the letter F.

What is a data suppression technique?

To protect sensitive statistical information, Statistics Canada typically uses suppression techniques. These techniques involve suppressing data points that can directly or indirectly reveal information about a respondent. This can often lead to the suppression of a large number of data points, significantly reducing the amount of available data. In Statistics Canada data tables, cells suppressed for confidentiality reasons are marked with an “x.”

What is random tabular adjustment?

Random tabular adjustment (RTA) is a new method being used by Statistics Canada to balance the need for more high-quality data outputs for users while protecting the confidential information of respondents in the release of economic data estimates. Using RTA, Statistics Canada can identify sensitive estimates that may reveal information about a respondent and randomly adjust their value instead of suppressing them.

How does random tabular adjustment differ from other data suppression techniques?

Random tabular adjustment (RTA) improves the utility of economic data tables released by Statistics Canada. While traditional suppression techniques use rules similar to RTA to determine whether a cell contains sensitive information, they will suppress, or not publish, a sensitive cell. With RTA, cell estimates can still be released as individual values are not disclosed. Using this method allows Statistics Canada to increase the amount of useful data it can publish while ensuring confidential information remains protected.

How does random tabular adjustment work?

Random tabular adjustment (RTA) identifies sensitive estimates and randomly adjusts their value, or adds noise, so an estimate remains confidential, allowing it to be released. In other words, instead of “suppressing” the data, the estimates are “perturbed.” The size of the adjustment is calculated to protect the confidentiality of the individual responses that contributed to the estimate.

After adjusting the value, Statistics Canada assigns a quality indicator (A, B, C, D, E or F) to the estimate to indicate the degree of confidence users can have in its accuracy. This indicator accounts for uncertainty throughout the data collection, processing and confidentiality steps; it is not just for uncertainty because of RTA.

Will random tabular adjustment impact all estimates in the same way?

Random tabular adjustment (RTA) will not impact all estimates the same way. RTA looks at every cell individually to make a determination as to whether or not it contains sensitive information. While most Census of Agriculture cells do not contain identifiable information and will not need to have RTA applied, a number of cells will have RTA applied as they contain information that could be directly attributable to one or more of the individual values making up the total estimate in the cell.

When RTA is applied to a cell, the program will make a determination on the amount of unique noise that will be added, and users will be unable to identify whether RTA has been applied to a cell or not. In this way, RTA provides a measure of protection for all cells in the table.

For the Census of Agriculture, RTA has not been applied to the following tables:

  • Characteristics of farm operators: age, sex and number of operators on the farm, Census of Agriculture, 2021 (Table 32-10-0381-01);
  • Characteristics of farm operators: farm work and other paid work, Census of Agriculture, 2021 (Table 32-10-0382-01).

In addition, RTA is not applied to count estimates, such as the number of farms with a certain characteristic.

What are the advantages of random tabular adjustment?

Random tabular adjustment allows for more data to be released, increasing the utility of data tables. Instead of “suppressing” the data, the estimates are “perturbed,” meaning that the sensitive information is randomly altered to protect the confidentiality of the individual responses contributing to the estimate. This is done by adjusting the estimate in question so a precise value cannot be assigned to an individual contribution.

Another added benefit of this method is that it does not affect cells that are not considered sensitive. Only sensitive cells and their aggregates are affected. In other data suppression techniques used by Statistics Canada, some cells have to be suppressed to protect the confidentiality of another cell. For example, if one part of a total is suppressed, another part must also be suppressed to ensure the confidential cell cannot be calculated from the total.

Where can I find more information on random tabular adjustment?

For more technical information on random tabular adjustment, please see Disclosure control and random tabular adjustment by Mark Stinner from the Statistical Society of Canada’s 2017 annual meeting.

2021 Census soundtrack

Every census supporter needs their soundtrack

As Canada's statistical portrait, the census is a reflection of the lives of Canadians. Listen to our playlists while you explore the 2021 Census data, and experience the different facets of Canadian culture. If these songs aren't already among your favourite tracks, we hope that you have the opportunity to discover something new as you learn about how our country has changed over the past five years.

Get comfortable, press play, and let's experience Canada's celebrated musical talent together.

For more information on the 2021 Census data, please visit the Census of Population.

2021 Census data releases

Nostalgic songs

Go back in time and get nostalgic with our selection of Canadian songs.

Nostalgic songs: Listen on Spotify

Roundabouts and corner stores

Do you feel it? Something special runs through these winding streets. Listen to Canada's trending tracks here.

Roundabouts and corner stores: Listen on Spotify

This city never sleeps

Lights. Sounds. Smells. Our cities buzz with excitement. We've curated Canada's soul, R&B and hip-hop tracks in one place.

This city never sleeps: Listen on Spotify

Backroads and rolling hills

Gear down, kick back and take the scenic route with some of Canada's best country hits.

Backroads and rolling hills: Listen on Spotify

2021 Census collection

Spark & soul

Modern pop, electronic and soul infusions from Canada's chart-toppers.

Spark & soul: Listen on Spotify

Studio sessions

Explore Canada's latest up-and-coming alternative and folk talent.

Studio sessions: Listen on Spotify

Friday night kitchen party

Turn it all the way up with Canada's current country favourites.

Friday night kitchen party: Listen on Spotify

Front row freedom

No concert? No problem. Jam out to Canada's best rock hits from the 2000's-onward.

Front row freedom: Listen on Spotify

Take the long way home

Take the trip down memory lane with highlights from Canada's 1990's and 2000's celebrated artists.

Take the long way home: Listen on Spotify

True North rap

Hard-hitting bars from coast to coast, this is Canadian rap and hip-hop.

True North rap: Listen on Spotify

Golden age

Legendary Canadian rock classics you'll be sure to recognize.

Golden age: Listen on Spotify

Francophone pride

Celebrated tracks from some of Canada's biggest French names.

Francophone pride: Listen on Spotify

Contemporary Francophone

Rising French-Canadian artists you'll want to keep an eye on.

Contemporary Francophone: Listen on Spotify

Voices of the North

A celebration of the sounds produced by Indigenous artists.

Voices of the North: Listen on Spotify

Spotlight: Contemporary Indigenous

Bringing together the best of Canada's fastest-rising Indigenous talent.

Spotlight: Contemporary Indigenous: Listen on Spotify

Date modified:

Why are we conducting this survey?

The purpose of this survey is to produce monthly statistics on stocks of butter and cheese held in cold storage warehouses.

The data are used by Agriculture and Agri-Food Canada, the Canadian Dairy Commission, provincial governments and the Dairy Farmers of Canada to assist in the development, administration and evaluation of agricultural policies.

Your information may also be used by Statistics Canada for other statistical and research purposes.

Your participation in this survey is required under the authority of the Statistics Act.

Other important information

Authorization to collect this information

Data are collected under the authority of the Statistics Act, Revised Statutes of Canada, 1985, Chapter S-19.

Confidentiality

By law, Statistics Canada is prohibited from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent, or as permitted by the Statistics Act. Statistics Canada will use the information from this survey for statistical purposes only.

Record linkages

To enhance the data from this survey and to reduce the reporting burden, Statistics Canada may combine the acquired data with information from other surveys or from administrative sources.

Data-sharing agreements

To reduce respondent burden, Statistics Canada has entered into data-sharing agreements with provincial and territorial statistical agencies and other government organizations, which have agreed to keep the data confidential and use them only for statistical purposes. Statistics Canada will only share data from this survey with those organizations that have demonstrated a requirement to use the data.

Section 11 of the Statistics Act provides for the sharing of information with provincial and territorial statistical agencies that meet certain conditions. These agencies must have the legislative authority to collect the same information, on a mandatory basis, and the legislation must provide substantially the same provisions for confidentiality and penalties for disclosure of confidential information as the Statistics Act. Because these agencies have the legal authority to compel businesses to provide the same information, consent is not requested and businesses may not object to the sharing of the data.

For this survey, there are Section 11 agreements with the provincial statistical agencies of Newfoundland and Labrador, Nova Scotia, New Brunswick, Quebec, Ontario, Manitoba, Saskatchewan, Alberta and British Columbia. The shared data will be limited to information pertaining to business establishments located within the jurisdiction of the respective province.

Business or organization and contact information

1. Verify or provide the business or organization's legal and operating name and correct where needed.

Note: Legal name modifications should only be done to correct a spelling error or typo.

Legal Name

The legal name is one recognized by law, thus it is the name liable for pursuit or for debts incurred by the business or organization. In the case of a corporation, it is the legal name as fixed by its charter or the statute by which the corporation was created.

Modifications to the legal name should only be done to correct a spelling error or typo.

To indicate a legal name of another legal entity you should instead indicate it in question 3 by selecting 'Not currently operational' and then choosing the applicable reason and providing the legal name of this other entity along with any other requested information.

Operating Name

The operating name is a name the business or organization is commonly known as if different from its legal name. The operating name is synonymous with trade name.

  • Legal name
  • Operating name (if applicable)

2. Verify or provide the contact information of the designated business or organization contact person for this questionnaire and correct where needed.

Note: The designated contact person is the person who should receive this questionnaire. The designated contact person may not always be the one who actually completes the questionnaire.

  • First name
  • Last name
  • Title
  • Preferred language of communication
    • English
    • French
  • Mailing address (number and street)
  • City
  • Province, territory or state
  • Postal code or ZIP code
  • Country
    • Canada
    • United States
  • Email address
  • Telephone number (including area code)
  • Extension number (if applicable)
    The maximum number of characters is 10.
  • Fax number (including area code)

3. Verify or provide the current operational status of the business or organization identified by the legal and operating name above.

  • Operational
  • Not currently operational
    Why is this business or organization not currently operational?
    • Seasonal operations
      • When did this business or organization close for the season?
        Date
      • When does this business or organization expect to resume operations?
        Date
    • Ceased operations
      • When did this business or organization cease operations?
        Date
      • Why did this business or organization cease operations?
        • Bankruptcy
        • Liquidation
        • Dissolution
        • Other
          Specify the other reasons why the operations ceased
    • Sold operations
      • When was this business or organization sold?
        Date
      • What is the legal name of the buyer?
    • Amalgamated with other businesses or organizations
      • When did this business or organization amalgamate?
        Date
      • What is the legal name of the resulting or continuing business or organization?
      • What are the legal names of the other amalgamated businesses or organizations?
    • Temporarily inactive but will re-open
      • When did this business or organization become temporarily inactive?
        Date
      • When does this business or organization expect to resume operations?
        Date
      • Why is this business or organization temporarily inactive?
    • No longer operating due to other reasons
      • When did this business or organization cease operations?
        Date
      • Why did this business or organization cease operations?

4. Verify or provide the current main activity of the business or organization identified by the legal and operating name above.

Note: The described activity was assigned using the North American Industry Classification System (NAICS).

This question verifies the business or organization's current main activity as classified by the North American Industry Classification System (NAICS). The North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. Created against the background of the North American Free Trade Agreement, it is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. NAICS is based on supply-side or production-oriented principles, to ensure that industrial data, classified to NAICS , are suitable for the analysis of production-related issues such as industrial performance.

The target entity for which NAICS is designed are businesses and other organizations engaged in the production of goods and services. They include farms, incorporated and unincorporated businesses and government business enterprises. They also include government institutions and agencies engaged in the production of marketed and non-marketed services, as well as organizations such as professional associations and unions and charitable or non-profit organizations and the employees of households.

The associated NAICS should reflect those activities conducted by the business or organizational units targeted by this questionnaire only, as identified in the 'Answering this questionnaire' section and which can be identified by the specified legal and operating name. The main activity is the activity which most defines the targeted business or organization's main purpose or reason for existence. For a business or organization that is for-profit, it is normally the activity that generates the majority of the revenue for the entity.

The NAICS classification contains a limited number of activity classifications; the associated classification might be applicable for this business or organization even if it is not exactly how you would describe this business or organization's main activity.

Please note that any modifications to the main activity through your response to this question might not necessarily be reflected prior to the transmitting of subsequent questionnaires and as a result they may not contain this updated information.

The following is the detailed description including any applicable examples or exclusions for the classification currently associated with this business or organization.

Description and examples

  • This is the current main activity
  • This is not the current main activity
    • Provide a brief but precise description of this business or organization's main activity
      e.g., breakfast cereal manufacturing, shoe store, software development

Main activity

5. You indicated that is not the current main activity. Was this business or organization's main activity ever classified as: ?

  • Yes
    When did the main activity change?
    Date
  • No

6. Search and select the industry classification code that best corresponds to this business or organization's main activity.

Select this business or organization's activity sector (optional)

  • Farming or logging operation
  • Construction company or general contractor
  • Manufacturer
  • Wholesaler
  • Retailer
  • Provider of passenger or freight transportation
  • Provider of investment, savings or insurance products
  • Real estate agency, real estate brokerage or leasing company
  • Provider of professional, scientific or technical services
  • Provider of health care or social services
  • Restaurant, bar, hotel, motel or other lodging establishment
  • Other sector

Dairy products - domestic and imported

1. What was the total inventory in kilograms (kg) of the following butter and butter oil products?

Include:

  • domestic and imported products
  • salted and unsalted butter.

Dairy products - domestic and imported

Include:

  • inventory for all dairy products held in your establishment(s), whether owned by you or by others
  • inventory stored in specially rented rooms to which only you have access (except in emergency)
  • stocks held on government accounts.

Exclude products held in common or cold public storage (these will be reported by operators of those establishments).

Total inventory of butter and butter oil products

Please report all inventory of butter and butter oil products including domestic and imported butter and butter oil products.

a. to c. Creamery butter

Include:

  • salted and unsalted butter
  • whipped butter
  • light or 'lite' butter
  • cultured butter
  • sweet butter
  • calorie-reduced butter
  • dairy spread.

Exclude reworked butter and manufacturing cream.

What was the total inventory in kilograms (kg) of the following butter and butter oil products?
  Total inventory on 1st of month (kg)
a. Creamery butter - held under Plan A  
b. Creamery butter - held under Plan B  
c. Creamery butter - held privately  
Total creamery butter  
d. Whey butter  
e. Butter oil  

2. What was the total inventory in kilograms (kg) of the following types of cheese?

Include domestic and imported products.

Dairy products - domestic and imported

Include:

  • inventory for all dairy products held in your establishment(s), whether owned by you or by others
  • inventory stored in specially rented rooms to which only you have access (except in emergency)
  • stocks held on government accounts.

Exclude products held in common or cold public storage (these will be reported by operators of those establishments).

Total inventory of cheese

Please report all inventory of cheese including domestic and imported cheese.

a. Cheddar

Include all sizes of cheddar cheese: block, stirred curd, curd and cheddar cheese used to make processed cheese.

b. Mozzarella

Include:

  • American full fat mozzarella (27% to 28 % B.F. )
  • American low fat mozzarella (17% to 20 % B.F. )
  • Italian full fat mozzarella (22% to 24 % B.F. )
  • Italian low fat mozzarella (15 % B.F. )
  • other mozzarella cheese products.

c. Other factory cheese (all varieties except cheddar, mozzarella and processed)

Include: brick, casata, feta, gouda, marble, swiss, curd cheese, etc.

d. Processed cheese

Include processed cheese, processed cheese food, processed cheese spread made from cheddar cheese or other cheeses.

What was the total inventory in kilograms (kg) of the following types of cheese?
  Total inventory on 1st of month (kg)
a. Cheddar  
b. Mozzarella  
c. Other factory cheese (all varieties except cheddar, mozzarella and processed)  
d. Processed cheese  
Total cheese  

3. Of the above dairy products held on 1st of month (kg), were any owned by dairy processors?

  • Yes
  • No

Inventory owned by dairy processors

4. Of the dairy products held in inventory on 1st of month (kg), which of the following were owned by dairy processors?

Select all that apply.

Inventory owned by dairy processors

Please indicate which dairy products held in inventory were owned by dairy processors.

Include inventory of dairy products which were owned by dairy processors and which were:

  • held in your establishment(s) or
  • stored in specially rented rooms to which only you have access (except in emergency) or
  • held on government accounts.

Exclude dairy products held in common or cold public storage (these will be reported by operators of those establishments).

Creamery butter - held under Plan A

  • How many dairy processors owned inventory of creamery butter held under Plan A?
  • Number of processors

Creamery butter - held under Plan B

  • How many dairy processors owned inventory of creamery butter held under Plan B?
  • Number of processors

Creamery butter - held privately

  • How many dairy processors owned inventory of creamery butter held privately?
  • Number of processors

Whey butter

  • How many dairy processors owned inventory of whey butter?
  • Number of processors

Butter oil

  • How many dairy processors owned inventory of butter oil?
  • Number of processors

Cheddar

  • How many dairy processors owned inventory of cheddar?
  • Number of processors

Mozzarella

  • How many dairy processors owned inventory of mozzarella?
  • Number of processors

Other factory cheese (all varieties except cheddar, mozzarella and processed)

  • How many dairy processors owned inventory of other factory cheese?
  • Number of processors

Processed cheese

  • How many dairy processors owned inventory of processed cheese?
  • Number of processors

5. For the following dairy product(s), what is the name of the dairy processor(s) and the quantity of inventory owned in kilograms (kg) by each dairy processor?

Inventory owned by dairy processors

Include inventory of dairy products which were owned by dairy processors and which were:

  • held in your establishment(s) or
  • stored in specially rented rooms to which only you have access (except in emergency) or
  • held on government accounts.

Exclude dairy products held in common or cold public storage (these will be reported by operators of those establishments).

For the following dairy product(s), what is the name of the dairy processor(s) and the quantity of inventory owned in kilograms (kg) by each dairy processor?
  Name of dairy processor Quantity owned on 1st of month (kg)
Creamery butter - held under Plan A    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Creamery butter - held under Plan B    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Creamery butter - held privately    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Whey butter    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Butter oil    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Cheddar    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Mozzarella    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Other factory cheese (all varieties except cheddar, mozzarella and processed)    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    
Processed cheese    
Dairy processor 1    
Dairy processor 2    
Dairy processor 3    
Dairy processor 4    
Dairy processor 5    
Dairy processor 6    
Dairy processor 7    
Dairy processor 8    
Dairy processor 9    

Changes or events

1. Indicate any changes or events that affected the reported values for this business or organization, compared with the last reporting period.

Select all that apply.

  • Strike or lock-out
  • Exchange rate impact
  • Price changes in goods or services sold
  • Contracting out
  • Organizational change
  • Price changes in labour or raw materials
  • Natural disaster
  • Recession
  • Change in product line
  • Sold business or business units
  • Expansion
  • New or lost contract
  • Plant closures
  • Acquisition of business or business units
  • Other
    Specify the other changes or events:
  • No changes or events

Contact person

1. Statistics Canada may need to contact the person who completed this questionnaire for further information.
Is the provided given names and the provided family name the best person to contact?

  • Yes
  • No

Who is the best person to contact about this questionnaire?

  • First name:
  • Last name:
  • Title:
  • Email address:
  • Telephone number (including area code):
  • Extension number (if applicable):
    The maximum number of characters is 5.
  • Fax number (including area code):

Feedback

1. How long did it take to complete this questionnaire?

Include the time spent gathering the necessary information.

  • Hours:
  • Minutes:

2. Do you have any comments about this questionnaire?

2022 Biannual Potato Area and Yield Survey – June: Reporting Guide

Integrated Business Statistics Program (IBSP)

This guide is designed to assist you as you complete the 2022 Biannual Potato Area and Yield Survey - June. If you need more information, please call the Statistics Canada Help Line at the number below.

Your answers are confidential.

Statistics Canada is prohibited by law from releasing any information it collects which could identify any person, business, or organization, unless consent has been given by the respondent or as permitted by the Statistics Act.

Statistics Canada will use information from this survey for statistical purposes.

Help Line: 1-877-949-9492 or TTY 1-855-382-7745

Table of contents

A - Reporting instructions

  • Report dollar amounts in Canadian dollars.
  • Exclude sales tax.
  • When precise figures are not available, please provide your best estimates.

B - Definitions

Legal Name

The legal name is one recognized by law, thus it is the name liable for pursuit or for debts incurred by the business or organization. In the case of a corporation, it is the legal name as fixed by its charter or the statute by which the corporation was created.

Modifications to the legal name should only be done to correct a spelling error or typo.

To indicate a legal name of another legal entity you should instead indicate it in question 3 by selecting 'Not currently operational' and then choosing the applicable reason and providing the legal name of this other entity along with any other requested information.

Operating Name

The operating name is a name the business or organization is commonly known as if different from its legal name. The operating name is synonymous with trade name.

Current main activity of the business or organization

The North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. Created against the background of the North American Free Trade Agreement, it is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. NAICS is based on supply-side or production-oriented principles, to ensure that industrial data, classified to NAICS, are suitable for the analysis of production-related issues such as industrial performance.

The target entity for which NAICS is designed are businesses and other organizations engaged in the production of goods and services. They include farms, incorporated and unincorporated businesses and government business enterprises. They also include government institutions and agencies engaged in the production of marketed and non-marketed services, as well as organizations such as professional associations and unions and charitable or non-profit organizations and the employees of households.

The associated NAICS should reflect those activities conducted by the business or organizational unit(s) targeted by this questionnaire only, and which can be identified by the specified legal and operating name. The main activity is the activity which most defines the targeted business or organization's main purpose or reason for existence. For a business or organization that is for-profit, it is normally the activity that generates the majority of the revenue for the entity.

The NAICS classification contains a limited number of activity classifications; the associated classification might be applicable for this business or organization even if it is not exactly how you would describe this business or organization's main activity.

Please note that any modifications to the main activity through your response to this question might not necessarily be reflected prior to the transmitting of subsequent questionnaires and as a result they may not contain this updated information.

C - Question 1

Did you sell any potatoes in the 2021 crop year?

Crop Year

The period of time between one year's harvest to the next.

For most provinces, the crop year is from August to the following July.

However, in British Columbia, they could harvest potatoes as early as June so their crop year could run from June to the following May.

D - Question 2

For the 2021 crop year, what was the quantity of potatoes sold and the total value received?

The following are for the quantity of potatoes sold and the total value received for the 2021 crop year.

Exclude any potatoes purchased for re-sale.

Report the total value received after any deductions or bonuses.

Report total value received taking into account all grades.

Table stock potatoes

Potatoes that are sold in bulk or in bags to be eaten fresh.

Seed potatoes

Potatoes that are planted the following spring to produce the next fall's crop of potatoes.

Processing potatoes

Potatoes that are converted to french fries, instant mashed potatoes, potato chips or starch.

Hundredweight/CWT

Pronounced hundredweight, it is a measure of weight used for potatoes that means 100 pounds.

E - Question 3 and 4

What is the total area of potatoes planted in the 2022 crop year?

Please report all planting intentions, if you have not completed your planting activities when completing this survey.

Thank you for your participation.

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography - February 2022

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography – November 2021
Table summary
This table displays the results of CVs for Total sales by Geography. The information is grouped by Geography (appearing as row headers), Month and percentage (appearing as column headers).
Geography Month
202102 202103 202104 202105 202106 202107 202108 202109 202110 202111 202112 202201 202202
percentage
Canada 0.19 0.47 1.44 1.59 1.32 3.40 0.43 0.18 0.20 0.19 0.15 0.41 0.29
Newfoundland and Labrador 0.48 2.16 2.05 2.53 0.46 0.60 0.54 0.57 0.67 0.62 0.60 1.50 2.63
Prince Edward Island 1.04 1.29 16.69 1.05 0.92 0.96 0.83 2.81 7.86 7.18 4.91 8.76 11.91
Nova Scotia 0.40 0.87 2.76 3.16 0.96 0.47 0.39 0.38 0.47 0.46 1.15 1.46 1.15
New Brunswick 0.50 0.39 1.08 1.75 0.46 0.57 0.47 0.59 0.63 0.60 1.69 1.27 1.23
Quebec 0.67 1.11 5.08 4.52 4.28 16.06 0.66 0.60 0.60 0.52 0.28 1.07 0.51
Ontario 0.24 0.99 2.56 2.99 2.64 1.24 0.88 0.24 0.28 0.32 0.21 0.78 0.30
Manitoba 0.46 0.45 1.21 2.59 0.67 0.81 0.43 0.44 0.77 0.82 0.52 1.79 1.15
Saskatchewan 0.52 0.46 1.22 0.88 0.61 10.59 0.96 0.81 1.56 1.28 0.76 1.60 1.49
Alberta 0.33 0.81 3.06 4.31 0.45 2.28 0.66 0.39 0.46 0.38 0.74 0.64 1.41
British Columbia 0.56 0.99 1.88 2.78 0.79 1.62 0.34 0.37 0.44 0.35 0.28 0.73 0.63
Yukon Territory 1.96 3.01 65.36 2.72 1.85 2.87 4.89 2.17 3.29 18.98 12.43 4.82 27.09
Northwest Territories 1.83 2.93 74.26 3.73 1.86 3.13 5.75 2.31 3.93 25.20 5.08 6.07 34.38
Nunavut 2.39 2.67 3.88 4.83 1.27 84.13 2.88 3.60 5.47 4.22 2.63 11.64 3.57

Statistics Canada and Mila partner to advance ethical artificial intelligence and machine learning development

May 3, 2022

Statistics Canada is proud to announce a new partnership with Mila, the world’s largest academic research centre in deep learning (DL). By joining Mila’s diversified community, Statistics Canada will be able to access a broader artificial intelligence (AI) ecosystem and partnership toolbox, effectively accelerating Statistics Canada’s ethical AI and ML research.

“Data science is a team sport and this partnership allows us to collectively grow our respective teams’ knowledge in these crucial research areas,” said Chief Statistician of Canada Anil Arora. “As a trusted data science leader, Statistics Canada works on the cutting-edge of artificial intelligence and machine learning projects. This new partnership supports our commitment to modernization as we continue to look for areas for improvement. It will also help us to maintain our position as a global leader in official statistics and deliver better services to Canadians.”

Mila, a Montreal-based non-profit organization, is internationally recognized for its significant contributions in the areas of language modelling, machine translation, object recognition and generative models. Building relationships with Mila’s community of 900 researchers will allow Statistics Canada to hear perspectives from a variety of peers and to collaborate with like-minded communities of practice. It will also offer Statistics Canada unique access to a growing pool of domestic and global talent.

“Mila’s partnership with Statistics Canada provides an exciting opportunity to combine our expertise and explore specific ML and AI challenges,” said Stéphane Létourneau, Executive Vice President of Mila. “Mila’s research community works daily towards the democratization of machine learning and the development of responsible AI. We are thrilled to continue these efforts alongside our new partner.”

Arora added the partnership will allow Statistics Canada to have direct access to up-and-coming experts in the field. “Being able to tap into that expertise, collaborate on projects, and discover what the next generation of leading AI and ML researchers are developing is a tremendous upside for the agency.”

Contacts

Media Relations
Statistics Canada
statcan.mediahotline-ligneinfomedias.statcan@statcan.gc.ca

Media Relations
Mila
medias@mila.quebec

Analytical Guide - Canadian Perspectives Survey Series 6: Substance Use and Stigma during the Pandemic

1.0 Description

The Canadian Perspectives Survey Series (CPSS) is a set of short, online surveys beginning in March 2020 that will be used to collect information on the knowledge and behaviours of residents of the 10 Canadian provinces. All surveys in the series will be asked of Statistics Canada's probability panel. The probability panel for the CPSS is a new pilot project initiated in 2019. An important goal of the CPSS is to directly collect data from Canadians in a timely manner in order to inform policy makers and be responsive to emerging data needs. The CPSS is designed to produce data at a national level (excluding the territories).

The survey program is sponsored by Statistics Canada. Each survey in the CPSS is cross sectional. Participating in the probability panel and the subsequent surveys of the CPSS is voluntary.

The sixth survey of the CPSS is CPSS6 – Substance Use and Stigma during the Pandemic. It was administered from January 25, 2021 until January 31, 2021.

Any questions about the survey, the survey series, the data or its use should be directed to:

Statistics Canada
Client Services
Centre for Social Data Integration and Development
Telephone: 613-951-3321 or call toll-free 1-800-461-9050
Fax: 613-951-4527
E-mail: statcan.csdidclientservice-ciddsservicealaclientele.statcan@canada.ca

2.0 Survey methodology

Target and survey population

The target population for the Canadian Perspectives Survey Series (CPSS) is residents of the 10 Canadian provinces 15 years of age or older.

The frame for surveys of the CPSS is Statistics Canada's pilot probability panel. The probability panel was created by randomly selecting a subset of the Labour Force Survey (LFS) respondents. Therefore the survey population is that of the LFS, with the exception that full-time members of the Canadian Armed Forces are included. Excluded from the survey's coverage are: persons living on reserves and other Aboriginal settlements in the provinces; the institutionalized population, and households in extremely remote areas with very low population density. These groups together represent an exclusion of less than 2% of the Canadian population aged 15 and over.

The LFS sample is drawn from an area frame and is based on a stratified, multi-stage design that uses probability sampling. The LFS uses a rotating panel sample design. In the provinces, selected dwellings remain in the LFS sample for six consecutive months. Each month about one-sixth of the LFS sampled dwellings are in their first month of the survey, one-sixth are in their second month of the survey, and so on. These six independent samples are called rotation groups.

For the probability panel used for the CPSS, four rotation groups from the LFS were used from the provinces: the rotation groups answering the LFS for the last time in April, May, June and July of 2019. From these households, one person aged 15+ was selected at random to participate in the CPSS - Sign-Up. These individuals were invited to Sign-Up for the CPSS. Those agreeing to join the CPSS were asked to provide an email address. Participants from the Sign-Up that provided valid email addresses formed the probability panel. The participation rate for the panel was approximately 23%. The survey population for all surveys of the CPSS is the probability panel participants. Participants of the panel are 15 years or older as of July 31, 2019.

Sample Design and Size

The sample design for surveys of the CPSS is based on the sample design of the CPSS – Sign-Up, the method used to create the pilot probability panel. The raw sample for the CPSS – Sign-Up had 31,896 randomly selected people aged 15+ from responding LFS households completing their last interview of the LFS in April to July of 2019. Of these people, 31,626 were in-scope at the time of collection for the CPSS - Sign-Up in January to March 2020. Of people agreeing to participate in the CPSS, that is, those joining the panel, 7,242 had a valid email address. All panel participants are invited to complete the surveys of the CPSS.

Sample Design and Size
Stages of the Sample n
Raw sample for the CPSS – Sign-Up 31,896
In-scope Units from the CPSS – Sign-Up 31,628
Panelists for the CPSS (with valid email addresses) 7,242
Raw sample for surveys of the CPSS 7,242

3.0 Data collection

CPSS – Sign-Up

The CPSS- Sign-Up survey used to create Statistics Canada's probability panel was conducted from January 15th, 2020 until March 15th, 2020. Initial contact was made through a mailed letter to the selected sample. The letter explained the purpose of the CPSS and invited respondents to go online, using their Secure Access Code to complete the Sign-Up form. Respondents opting out of joining the panel were asked their main reason for not participating. Those joining the panel were asked to verify basic demographic information and to provide a valid email address. Nonresponse follow-up for the CPSS-Sign-Up had a mixed mode approach. Additional mailed reminders were sent to encourage sampled people to respond. As well, email reminders (where an email address was available) and Computer Assisted Telephone Interview (CATI) nonresponse follow-up was conducted.

The application included a standard set of response codes to identify all possible outcomes. The application was tested prior to use to ensure that only valid question responses could be entered and that all question flows would be correctly followed. These measures ensured that the response data were already "clean" at the end of the collection process.

Interviewers followed a standard approach used for many StatCan surveys in order to introduce the agency. Selected persons were told that their participation in the survey was voluntary, and that their information would remain strictly confidential.

CPSS6 – Substance Use and Stigma during the Pandemic

All participants of the pilot panel for the CPSS, minus those who opted out after previous iterations of CPSS, were sent an email invitation with a link to the CPSS6 and a Secure Access Code to complete the survey online. Collection for the survey began on January 25th, 2021. Reminder emails were sent on January 26th, January 28th and January 30th. The application remained open until January 31, 2021.

3.1 Disclosure control

Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

4.0 Data quality

Survey errors come from a variety of different sources. They can be classified into two main categories: non-sampling errors and sampling errors.

4.1 Non-sampling errors

Non-sampling errors can be defined as errors arising during the course of virtually all survey activities, apart from sampling. They are present in both sample surveys and censuses (unlike sampling error, which is only present in sample surveys). Non-sampling errors arise primarily from the following sources: nonresponse, coverage, measurement and processing.

4.1.1 Nonresponse

Nonresponse errors result from a failure to collect complete information on all units in the selected sample.

Nonresponse produces errors in the survey estimates in two ways. Firstly, non-respondents often have different characteristics from respondents, which can result in biased survey estimates if nonresponse bias is not fully corrected through weighting. Secondly, it reduces the effective size of the sample, since fewer units than expected answered the survey. As a result, the sampling variance increases and the precision of the estimate decreases. The response rate is calculated as follows:

[Responding units / (Selected units – out-of-scope units)] × 100%

The following table summarizes the response rates experienced for the CPSS6 – Substance Use and Stigma during the Pandemic. Response rates are broken down into two stages. Table 4.1.1a shows the take-up rates to the panel in the CPSS- Sign-Up and Table 4.1.1b shows the collection response rates for the survey CPSS6 – Substance Use and Stigma during the Pandemic.

Table 4.1.1a Participation in the Pilot Probability Panel for the CPSS – Sign-Up
  Stages of the Sample for the CPSS – Sign-Up
Raw sample for the CPSS – Sign-Up In-scope Units from the CPSS – Sign-Up Panelists for the CPSS (with valid email addresses) Participation Rate for the Panel for CPSS
n 31,896 31,628 7,242 22.9%
Table 4.1.1b Response Rates for the CPSS6 – Substance Use and Stigma during the Pandemic
  Stages of the Sample for the CPSS6 – Substance Use and Stigma during the Pandemic
Panelists for the CPSS (with valid email addresses) Respondents of CPSS6 – Substance Use and Stigma during the Pandemic Collection Response Rate for CPSS6 – Substance Use and Stigma during the Pandemic Cumulative Response Rate
n 7,242 3,941 54.4% 12.5%

As shown in Table 4.1.1b, the collection response rate for the CPSS6 – Substance Use and Stigma during the Pandemic is 54.4%. However, when nonparticipation in the panel is factored in, the cumulative response rate to the survey is 12.5%. This cumulative response rate is lower than the typical response rates observed in social surveys conducted at Statistics Canada. This is due to the two stages of nonresponse (or participation) and other factors such as the single mode used for surveys of the CPSS (emailed survey invitations with a link to the survey for online self-completion), respondent fatigue from prior LFS response, the inability of the offline population to participate, etc.

Given the additional nonresponse experienced in the CPSS6 – Substance Use and Stigma during the Pandemic, there is an increased risk of bias due to respondents being different than non-respondents. For this reason, a small bias study was conducted. Please see Section 6.0 for the results of this validation.

4.1.2 Coverage errors

Coverage errors consist of omissions, erroneous inclusions, duplications and misclassifications of units in the survey frame. Since they affect every estimate produced by the survey, they are one of the most important types of error; in the case of a census they may be the main source of error. Coverage errors may cause a bias in the estimates and the effect can vary for different sub-groups of the population. This is a very difficult error to measure or quantify accurately.

For the CPSS, the population covered are those aged 15+ as of July 31, 2019. Since collection of the CPSS6 – Substance Use and Stigma during the Pandemic was conducted from January 25th – 31st, 2021, there is an undercoverage of residents of the 10 provinces that turned 15 since July 31, 2019. There is also undercoverage of those without internet access. This undercoverage is greater amongst those age 65 years and older.

4.1.3 Measurement errors

Measurement errors (sometimes referred to as response errors) occur when the response provided differs from the real value; such errors may be attributable to the respondent, the questionnaire, the collection method or the respondent's record-keeping system. Such errors may be random or they may result in a systematic bias if they are not random. It is very costly to accurately measure the level of response error and very few surveys conduct a post-survey evaluation.

4.1.4 Processing errors

Processing errors are the errors associated with activities conducted once survey responses have been received. It includes all data handling activities after collection and prior to estimation. Like all other errors, they can be random in nature, and inflate the variance of the survey's estimates, or systematic, and introduce bias. It is difficult to obtain direct measures of processing errors and their impact on data quality especially since they are mixed in with other types of errors (nonresponse, measurement and coverage).

4.2 Sampling errors

Sampling errors are defined as the errors that result from estimating a population characteristic by measuring a portion of the population rather than the entire population. For probability sample surveys, methods exist to calculate sampling error. These methods derive directly from the sample design and method of estimation used by the survey.

The most commonly used measure to quantify sampling error is sampling variance. Sampling variance measures the extent to which the estimate of a characteristic from different possible samples of the same size and the same design differ from one another. For sample designs that use probability sampling, the magnitude of an estimate's sampling variance can be estimated.

Factors affecting the magnitude of the sampling variance for a given sample size include:

  1. The variability of the characteristic of interest in the population: the more variable the characteristic in the population, the larger the sampling variance.
  2. The size of the population: in general, the size of the population only has an impact on the sampling variance for small to moderate sized populations.
  3. The response rate: the sampling variance increases as the sample size decreases. Since non-respondents effectively decrease the size of the sample, nonresponse increases the sampling variance.
  4. The sample design and method of estimation: some sample designs are more efficient than others in the sense that, for the same sample size and method of estimation, one design can lead to smaller sampling variance than another.

The standard error of an estimator is the square root of its sampling variance. This measure is easier to interpret since it provides an indication of sampling error using the same scale as the estimate whereas the variance is based on squared differences.

The coefficient of variation (CV) is a relative measure of the sampling error. It is defined as the estimate of the standard error divided by the estimate itself, usually expressed as a percentage (10% instead of 0.1). It is very useful for measuring and comparing the sampling error of quantitative variables with large positive values. However, it is not recommended for estimates such as proportions, estimates of change or differences, and variables that can have negative values.

It is considered a best practice at Statistics Canada to report the sampling error of an estimate through its 95% confidence interval. The 95% confidence interval of an estimate means that if the survey were repeated over and over again, then 95% of the time (or 19 times out of 20), the confidence interval would cover the true population value.

5.0 Weighting

The principle behind estimation in a probability sample such as those of the CPSS, is that each person in the sample "represents", besides himself or herself, several other persons not in the sample. For example, in a simple random 2% sample of the population, each person in the sample represents 50 persons in the population. In the terminology used here, it can be said that each person has a weight of 50.

The weighting phase is a step that calculates, for each person, his or her associated sampling weight. This weight appears on the microdata file, and must be used to derive estimates representative of the target population from the survey. For example, if the number of individuals who smoke daily is to be estimated, it is done by selecting the records referring to those individuals in the sample having that characteristic and summing the weights entered on those records. The weighting phase is a step which calculates, for each record, what this number is. This section provides the details of the method used to calculate sampling weights for the CPSS6 – Substance Use and Stigma during the Pandemic.

The weighting of the sample for the CPSS6 – Substance Use and Stigma during the Pandemic has multiple stages to reflect the stages of sampling, participation and response to get the final set of respondents. The following sections cover the weighting steps to first create the panel weights, then the weighting steps to create the survey weights for CPSS6 – Substance Use and Stigma during the Pandemic.

5.1 Creating the Panel Weights

Four consecutive rotate-out samples of households from the LFS were the starting point to form the panel sample of the CPSS. Since households selected from the LFS samples are the starting point, the household weights from the LFS are the first step to calculating the panel weights.

5.1.1 Household weights

Calculation of the Household Design Weights – HHLD_W0, HHLD_W1

The initial panel weights are the LFS subweights (SUBWT). These are the LFS design weights adjusted for nonresponse but not yet calibrated to population control totals. These weights form the household design weight for the panel survey (HHLD_W0).

Since only four rotate-outs were used, instead of the six used in a complete LFS sample, these weights were adjusted by a factor of 6/4 to be representative. The weights after this adjustment were called HHLD_W1.

Calibration of the Household Weights – HHLD_W2

Calibration is a step to ensure that the sum of weights within a certain domain match projected demographic totals. The SUBWT from the LFS are not calibrated, thus HHLD_W1 are also not calibrated. The next step is to make sure the household weights add up to the control totals by household size. Calibration was performed on HHLD_W1 to match control totals by province and household size using the size groupings of 1, 2, or 3+.

5.1.2 Person Panel weights

Calculate Person Design Weights – PERS_W0

One person aged 15 or older per household was selected for the CPSS – Sign-Up, the survey used to create the probability panel. The design person weight is obtained by multiplying HHLD_W2 by the number of eligible people in the dwelling (i.e. number of people aged 15 years and over).

Removal of Out of Scope Units – PERS_W1

Some units were identified as being out-of-scope during the CPSS – Sign-Up. These units were given a weight of PERS_W1 = 0. For all other units, PERS_W1 = PERS_W0. Persons with a weight of 0 are subsequently removed from future weight adjustments.

Nonresponse/Nonparticipation Adjustment – PERS_W2

During collection of the CPSS – Sign-Up, a certain proportion of sampled units inevitably resulted in nonresponse or nonparticipation in the panel. Weights of the nonresponding/nonparticipating units were redistributed to participating units. Units that did not participate in the panel had their weights redistributed to the participating units with similar characteristics within response homogeneity groups (RHGs).

Many variables from the LFS were available to build the RHG (such as employment status, education level, household composition) as well as information from the LFS collection process itself. The model was specified by province, as the variables chosen in the model could differ from one province to the other.

The following variables were kept in the final logistic regression model: education_lvl (education level variable with 10 categories), nameissueflag (a flag created to identify respondents not providing a valid name), elg_hhldsize (number of eligible people for selection in the household), age_grp (age group of the selected person), sex, kidsinhhld (an indicator to flag whether or not children are present in the household), marstat (marital status with 6 categories), cntrybth (an indicator if the respondent was born in Canada or not), lfsstat (labour force status of respondent with 3 categories), nocs1 (the first digit of National Occupational Classification code of the respondent if employed, with 10 categories), and dwelrent (an indicator of whether the respondent dwelling is owned or rented). RHGs were formed within provinces. An adjustment factor was calculated within each response group as follows:

Sum of weights of respondents and nonrespondents / Sum of weights of respondents

The weights of the respondents were multiplied by this factor to produce the PERS_W2 weights, adjusted for panel nonparticipation. The nonparticipating units were dropped from the panel.

5.2 Creating the CPSS6 weights

Surveys of the CPSS start with the sample created from the panel participants. The panel is comprised of 7,242 individuals, each with the nonresponse adjusted weight of PERS_W2.

Calculation of the Design Weights – WT_DSGN

The design weight is the person weight adjusted for nonresponse calculated for the panel participants (PERS_W2). No out-of-scope units were identified during the survey collection of CPSS6 – Substance Use and Stigma during the Pandemic. Since all units were in-scope, WT_DSGN =PERS_W2 and no units were dropped.

Nonresponse Adjustment – WT_NRA

Given that the sample for CPSS was formed by people having agreed to participate in a web panel, the response rates to the survey were relatively high. Additionally, the panel was designed to produce estimates at a national level, so sample sizes by province were not overly large. As a result, nonresponse was fairly uniform in many provinces. The RHGs were formed by some combination of age group, sex, education level, rental status, LFS status, whether or not children are present in the household, marital status and the first digit of the National Occupational Classification (NOC) code for respondents who are employed. An adjustment factor was calculated within each response group as follows:

Sum of weights of respondents and nonrespondents / Sum of weights of respondents

The weights of the respondents were multiplied by this factor to produce the WT_NRA weights, adjusted for survey response. The nonresponding units were dropped from the survey.

Calibration of Person-Level Weights – WT_FINL

Control totals were computed using LFS demography projection data. During calibration, an adjustment factor is calculated and applied to the survey weights. This adjustment is made such that the weighted sums match the control totals. Most social surveys calibrate the person level weights to control totals by sex, age group and province. For CPSS6, calibration by province was not possible, since there were very few respondents in some categories in the Atlantic and Prairie Provinces. In addition, there were very small counts for male respondents aged 15 to 24 in the Atlantic Provinces. For this reason, the control totals used for CPSS6 – Substance use and stigma during the pandemic were by age group and sex by geographic region, where the youngest age group for males in the Atlantic region was collapsed with the second youngest age group. The next section will include recommendations for analysis by geographic region and age group.

5.3 Bootstrap Weights

Bootstrap weights were created for the panel and the CPSS6 – Substance Use and Stigma during the Pandemic survey respondents. The LFS bootstrap weights were the initial weights and all weight adjustments applied to the survey weights were also applied to the bootstrap weights.

6.0 Quality of the CPSS and Survey Verifications

The probability panel created for the CPSS is a pilot project started in 2019 by Statistics Canada. While the panel offers the ability to collect data quickly, by leveraging a set of respondents that have previously agreed to participate in multiple short online surveys, and for whom an email address is available to expedite survey collection, some aspects of the CPSS design put the resulting data at a greater risk of bias. The participation rate for the panel is lower than typically experienced in social surveys conducted by Statistics Canada which increases the potential nonresponse bias. Furthermore, since the surveys of the CPSS are all self-complete online surveys, people without internet access do not have the means to participate in the CPSS and therefore are not covered.

When the unweighted panel was compared to the original sample targeted to join the panel, in particular there was an underrepresentation of those aged 15-24, those aged 65 and older, and those with less than a high school degree. These differences were expected due to the nature of the panel and the experience of international examples of probability panels. Using LFS responding households as the frame for the panel was by design in order to leverage the available LFS information to correct for the underrepresentation and overrepresentation experienced in the panel. The nonresponse adjustments performed in the weighting adjustments of the panel and the survey respondents utilised the available information to ensure the weights of nonresponding/nonparticipating units went to similar responding units. Furthermore, calibration to age and sex totals helped to adjust for the underrepresentation by age group.

Table 6.1 shows the slippage rates by certain domains post-calibration of CPSS6 – Substance Use and Stigma during the Pandemic. The slippage rate is calculated by comparing the sum of weights in the domain to that of the control total based off of demographic projections. A positive slippage rate means the sample has an over-count for that domain. A negative slippage rate means the survey has an under-count for that domain. Based on the results shown in Tables 6.1 and 6.2, it is recommended to only use the data at the geographical levels and age groups where there is no slippage. That is nationally, by geographic region (Maritime Provinces, Quebec, Ontario, Prairie Provinces, and British Columbia), and by the four oldest age groups.

Table 6.1 Slippage rates by geographic region
Area Domain n Slippage Rate
Geography CanadaTable 6.1 note 1 3,941 0%
Newfoundland and Labrador 111 -9.6%
Prince Edward Island 86 18.3%
Nova Scotia 224 2.2%
New Brunswick 169 0.1%
Quebec 653 0%
Ontario 1,170 0%
Manitoba 315 -3.0%
Saskatchewan 270 -3.0%
Alberta 424 1.7%
British Columbia 519 0%
Table 6.1 note 1

Based on the 10 provinces; territories are excluded.

Return to table 6.1 note 1 referrer

Table 6.2 Slippage rates by age group
Area Domain n Slippage Rate
Age Group 15-24 193 3.1%
25-34 445 -2.7%
35-44 629 0%
45-54 654 0%
55-64 917 0%
65+ 1,103 0%

After the collection of CPSS6 – Substance Use and Stigma during the Pandemic, a small study was conducted to assess the potential bias due to the lower response rates and the undercoverage of the population not online. The LFS data was used to produce weighted estimates for the in-scope sample targeted to join the probability panel (using the weights and sample from PERS_W1). The same data was used to produce weighted estimates based on the set of respondents from the CPSS6 survey and the weights WT_FINL. The two set of estimates were compared and are shown in Table 6.3. The significant differences are highlighted.

Table 6.3 Changes in estimates due to nonparticipation in the CPSS and the COVID-19 survey
Subject Recoded variables from 2019 LFS Estimate for in-scope population (n=31,628) Estimate for W6 of CPSS (n=3,941) % Point Difference
Education Less than High School 15.5% 13.4% 2.1%
High School no higher certification 25.9% 25.5% 0.4%
Post-secondary certificationTable 6.3 note 1 58.6% 61.1% -2.4%
Labour Force Status Employed 61.2% 61.7% -0.6%
Unemployed 3.4% 3.3% 0.1%
Not in Labour Force 35.3% 34.9% 0.4%
Country of Birth CanadaTable 6.3 note 1 71.7% 75.4% -3.7%
Marital Status Married/Common-law 60.4% 60.6% -0.2%
Divorced, separated, widowed 12.8% 12.1% 0.6%
Single, never married 26.9% 27.2% -0.4%
Kids Presence of childrenTable 6.3 note 1 31.7% 34.5% -2.8%
Household Size Single person 14.4% 14.4% 0.1%
Two person HH 34.8% 36.6% -1.8%
Three or more people 18.4% 19.4% -1.1%
Eligible people for panel One eligible person aged 15+ 15.9% 16.1% -0.2%
Two eligible peopleTable 6.3 note 1 49.3% 52.1% -2.8%
Three or more eligible peopleTable 6.3 note 1 34.8% 31.8% 3.1%
Dwelling Apartment 12.1% 12.3% -0.2%
Rented 24.8% 24.1% 0.7%
Occupational Code Management occupations (NOC0) 6.0% 5.8% 0.2%
Business Finance and Administration (NOC1) 10.7% 11.3% -0.6%
Natural and Applied Sciences and related occupations (NOC2)Table 6.3 note 1 5.2% 7.0% -1.8%
Health Occupations (NOC3) 4.7% 4.3% 0.4%
Occupations in education, law and social, community and government services (NOC4) 7.6% 8.3% -0.7%
Occupations in art, culture, recreation and sports (NOC5) 2.5% 3.2% -0.7%
Sales and service occupations (NOC6) 16.6% 16.1% 0.5%
Trades, transport and equipment operators and related occupations (NOC7) 9.6% 9.0% 0.6%
Natural resources, agriculture and related production occupations (NOC8)Table 6.3 note 1 1.6% 0.9% 0.7%
Occupations in manufacturing and utilities (NOC9) 2.9% 2.6% 0.4%
Table 6.3 note 1

Estimates that are significantly different at α=5%.

Return to the first table 6.3 note 1 referrer

While many estimates do not show significant change, the significant differences show that some bias remains in the CPSS6 – Substance use and stigma during the pandemic. There is an underrepresentation of those where there were three or more eligible participants for the panel and of people working in NOCS8. And there is an overrepresentation of those with a post-secondary certification, of people born in Canada, of people working in NOC2, of households where there were two eligible participants for the panel, and of households with children. These small differences should be kept in mind when using the CPSS6 – Substance use and stigma during the pandemic survey data. Investigation about differences in estimates is ongoing, and as evidence of differences are identified, strategies are being tested to improve the methodology from one wave of the survey to the next.

Why do we conduct this survey?

This survey collects financial data from the Canadian Level IV air carriers. This information is used to determine if a carrier has reached the revenue threshold required to qualify for reporting Level III. The data are also used by various government departments for statistical and research purposes.

Your information may also be used by Statistics Canada for other statistical and research purposes.

Your participation in this survey is required under the authority of the Statistics Act.

Other important information

Authorization to collect this information

Data are collected under the authority of the Statistics Act, Revised Statutes of Canada, 1985, Chapter S-19.

Confidentiality

By law, Statistics Canada is prohibited from releasing any information it collects that could identify any person, business or organization, unless consent has been given by the respondent, or as permitted by the Statistics Act. Statistics Canada will use the information from this survey for statistical purposes only.

Record linkages

To enhance the data from this survey and to reduce the response burden, Statistics Canada may combine the acquired data with information from other surveys or from administrative sources.

Data-sharing agreements

To reduce respondent burden, Statistics Canada has entered into data-sharing agreements under Section 12 of the Statistics Act with Natural Resources Canada, Transport Canada and the Canadian Transportation Agency. Statistics Canada will only share data from this survey with those organizations that have demonstrated a requirement to use the data.

Under Section 12 of the Statistics Act, respondents can object to the sharing of information with other organizations. However, respondents do not have the right of refusal with respect to sharing the data with Transport Canada. Transport Canada has the legislative authority to collect and use this information pursuant to the Canada Transportation Act (CTA) and the Transportation Information Regulations.

Respondents may refuse to share their information with Natural Resources Canada and the Canadian Transportation Agency by writing a letter of objection to the Chief Statistician, and mailing it to the following address. These organizations have agreed to keep the data confidential and use them only for statistical purposes.

Chief Statistician of Canada
Statistics Canada
Attention of Director, Enterprise Statistics Division
150 Tunney's Pasture Driveway
Ottawa, Ontario
K1A 0T6

You may also contact us by email at statcan.esd-helpdesk-dse-bureaudedepannage.statcan@canada.ca or by fax at 613-951-6583.

Business or organization and contact information

1. Verify or provide the business or organization’s legal and operating name, and correct information if needed.

Note: Legal name should only be modified to correct a spelling error or typo.

Legal name

The legal name is one recognized by law, thus it is the name liable for pursuit or for debts incurred by the business or organization. In the case of a corporation, it is the legal name as fixed by its charter or the statute by which the corporation was created.

Modifications to the legal name should only be done to correct a spelling error or typo.

To indicate a legal name of another legal entity you should instead indicate it in question 3 by selecting 'Not currently operational' and then choosing the applicable reason and providing the legal name of this other entity along with any other requested information.

Operating name

The operating name is a name the business or organization is commonly known as if different from its legal name. The operating name is synonymous with trade name.

  • Legal name:
  • Operating name (if applicable):

2. Verify or provide the contact information for the designated contact person for the business or organization, and correct information if needed.

Note: The designated contact person is the person who should receive this questionnaire. The designated contact person may not always be the one who actually completes the questionnaire.

  • First name:
  • Last name:
  • Title:
  • Preferred language of communication:
    • English
    • French
  • Mailing address (number and street):
  • City:
  • Province, territory or state:
  • Postal code or ZIP code:
  • Country:
    • Canada
    • United States
  • Email address:
  • Telephone number (including area code):
  • Extension number (if applicable):
  • Fax number (including area code):

3. Verify or provide the current operational status of the business or organization identified by the legal and operating name above.

  • Operational
  • Not currently operational - e.g., temporarily or permanently closed, change of ownership
    Why is this business or organization not currently operational?
    • Seasonal operations
      • When did this business or organization close for the season?
        • Date
      • When does this business or organization expect to resume operations?
        • Date
    • Ceased operations
      • When did this business or organization cease operations?
        • Date
      • Why did this business or organization cease operations?
        • Bankruptcy
        • Liquidation
        • Dissolution
        • Other - Specify the other reasons why operations ceased
    • Sold operations
      • When was this business or organization sold?
        • Date
      • What is the legal name of the buyer?
    • Amalgamated with other businesses or organizations
      • When did this business or organization amalgamate?
        • Date
      • What is the legal name of the resulting or continuing business or organization?
      • What are the legal names of the other amalgamated businesses or organizations?
    • Temporarily inactive but will reopen
      • When did this business or organization become temporarily inactive?
        • Date
      • When does this business or organization expect to resume operations?
        • Date
      • Why is this business or organization temporarily inactive?
    • No longer operating because of other reasons
      • When did this business or organization cease operations?
        • Date
      • Why did this business or organization cease operations?

4. Verify or provide the current main activity of the business or organization identified by the legal and operating name above.

Note: The described activity was assigned using the North American Industry Classification System (NAICS).

This question verifies the business or organization's current main activity as classified by the North American Industry Classification System (NAICS). The North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. Created against the background of the North American Free Trade Agreement, it is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. NAICS is based on supply-side or production-oriented principles, to ensure that industrial data, classified to NAICS , are suitable for the analysis of production-related issues such as industrial performance.

The target entity for which NAICS is designed are businesses and other organizations engaged in the production of goods and services. They include farms, incorporated and unincorporated businesses and government business enterprises. They also include government institutions and agencies engaged in the production of marketed and non-marketed services, as well as organizations such as professional associations and unions and charitable or non-profit organizations and the employees of households.

The associated NAICS should reflect those activities conducted by the business or organizational units targeted by this questionnaire only, as identified in the 'Answering this questionnaire' section and which can be identified by the specified legal and operating name. The main activity is the activity which most defines the targeted business or organization's main purpose or reason for existence. For a business or organization that is for-profit, it is normally the activity that generates the majority of the revenue for the entity.

The NAICS classification contains a limited number of activity classifications; the associated classification might be applicable for this business or organization even if it is not exactly how you would describe this business or organization's main activity.

Please note that any modifications to the main activity through your response to this question might not necessarily be reflected prior to the transmitting of subsequent questionnaires and as a result they may not contain this updated information.

The following is the detailed description including any applicable examples or exclusions for the classification currently associated with this business or organization.

Description and examples

  • This is the current main activity
  • This is not the current main activity

Provide a brief but precise description of this business or organization's main activity:
e.g., breakfast cereal manufacturing, shoe store, software development

Main activity

5. You indicated that (activity) is not the current main activity. Was this business or organization's main activity ever classified as: (activity)?

  • Yes
    When did the main activity change?
    Date:
  • No

6. Search and select the industry classification code that best corresponds to this business or organization's main activity.

How to search:

  • if desired, you can filter the search results by first selecting the business or organization’s activity sector
  • enter keywords or a brief description that best describe the business or organization’s main activity
  • press the Search button to search the database for an activity that best matches the keywords or description you provided
  • select an activity from the list.

Select this business or organization's activity sector (optional)

  • Farming or logging operation
  • Construction company or general contractor
  • Manufacturer
  • Wholesaler
  • Retailer
  • Provider of passenger or freight transportation
  • Provider of investment, savings or insurance products
  • Real estate agency, real estate brokerage or leasing company
  • Provider of professional, scientific or technical services
  • Provider of health care or social services
  • Restaurant, bar, hotel, motel or other lodging establishment
  • Other sector

Enter keywords or a brief description, then press the Search button

Reporting period information

1. What are the start and end dates of this business's or organization's most recently completed fiscal year?

For this survey, the end date should fall between April 1, 2021 and March 31, 2022.

Here are twelve common fiscal periods that fall within the targeted dates:

  • May 1, 2020 to April 30, 2021
  • June 1, 2020 to May 31, 2021
  • July 1, 2020 to June 30, 2021
  • August 1, 2020 to July 31, 2021
  • September 1, 2020 to August 31, 2021
  • October 1, 2020 to September 30, 2021
  • November 1, 2020 to October 31, 2021
  • December 1, 2020 to November 30, 2021
  • January 1, 2021 to December 31, 2021
  • February 1, 2021 to January 31, 2022
  • March 1, 2021 to February 28, 2022
  • April 1, 2021 to March 31, 2022.

Here are other examples of fiscal periods that fall within the required dates:

  • September 18, 2020 to September 15, 2021 ( e.g., floating year-end)
  • June 1, 2021 to December 31, 2021 ( e.g., a newly opened business).

Fiscal year start date:

Fiscal year-end date:

2. What is the reason the reporting period does not cover a full year?

Select all that apply.

  • Seasonal operations
  • New business
  • Change of ownership
  • Temporarily inactive
  • Change of fiscal year
  • Ceased operations
  • Other
    Specify reason the reporting period does not cover a full year:

Statement of Revenues, Annual - Statement 21 (IV)

1. For the reporting period ending YYYY-MM-DD , what was the operating revenue earned by this business?

Report all amounts in thousands of Canadian dollars.

Scheduled services

Transportation of passengers or goods, or both, by an aircraft provided by an air carrier that operates the air service and that, directly or indirectly, sells some or all of its seats or part or all of its cargo space to the public on a price per seat, price per unit of mass or price per volume of cargo basis.

Charter services

Transportation of passengers or goods, or both, by aircraft pursuant to a contract under which a person, other than the air carrier that operates the air service, or its agent, reserves a block of seats or part of the cargo space of an aircraft for the person's use or for resale to the public.

Include air ambulance service and the movement of people and goods to logging or heli-logging sites.

Exclude firefighting and heli-logging activities and the movement of people and goods to a firefighting site. (A complete list of activities which are specialty and therefore not subject to filing requirements as charter can be found in the Transport Canada document entitled "Starting a Commercial Air Service", TP 8880.)

Fixed wing

Means a power-driven, heavier-than-air aircraft, deriving its lift in flight chiefly from aerodynamic reactions on surfaces which remain fixed. An aircraft having wings fixed to the airplane fuselage and outspread in flight - that is non-rotating wings.

Helicopter

Means a rotary wing, heavier-than-air aircraft, supported in flight chiefly by the reactions of the air on one or more power-driven rotors on substantially vertical axes. A helicopter does not have conventional fixed wings, nor is it provided with a conventional propeller for forward thrust.

Total operating revenue

Include revenue from air transportation services (for example, transportation of passengers, transportation of goods and other flight-related revenue [such as flying training, recreational flying and other specialty flying]) and all other sources.

For the reporting period ending YYYY-MM-DD , what was the operating revenue earned by this business?
  CAN$ '000
O scope="row"perating revenue
Include scheduled and charter services.
 
a. Fixed wing services  
b. Helicopter services  
Total operating revenue  

Changes or events

1. Indicate any changes or events that affected the reported values for this business or organization, compared with the last reporting period.

Select all that apply.

  • Strike or lock-out
  • Exchange rate impact
  • Price changes in goods or services sold
  • Contracting out
  • Organizational change
  • Price changes in labour or raw materials
  • Natural disaster
  • Recession
  • Change in product line
  • Sold business or business units
  • Expansion
  • New or lost contract
  • Plant closures
  • Acquisition of business or business units
  • Other
    Specify the other changes or events:
  • No change or event

Contact person

1. Statistics Canada may need to contact the person who completed this questionnaire for further information. Is ([Provided Given Names]) , ([Provided Family Name]) the best person to contact?

  • Yes
  • No

Who is the best person to contact about this questionnaire?

  • First name:
  • Last name:
  • Title:
  • Email address:
  • Telephone number (including area code):
  • Extension number (if applicable):
  • Fax number (including area code):

Feedback

1. How long did it take to complete this questionnaire?

Include the time spent gathering the necessary information.

  • Hours:
  • Minutes:

2. Do you have any comments about this questionnaire?

Enter your comments