The Rationale Behind Deep Neural Network Decisions

By: Oladayo Ogunnoiki, Statistics Canada

Introduction

In May 2016, Microsoft introduced Tay to the Twittersphere. Tay was an experimental artificial intelligence (AI) chatbot in "conversational understanding". The more you chatted with Tay, the smarter it would become. However, it didn't take long for the experiment to go awry. Tay was supposed to be engaging people in playful conversation, but this playful banter quickly turned into misogynistic and racist commentary.

Of course, the public was perplexed by this turn of events. If this bot was inherently rude, why wouldn't other AI models also go off course? Most Twitter users felt that this bleak event was only a glimmer of what was to come if our future was indeed rich in AI models. However, most data scientists understood the real reason for Tay's negative commentary – the bot was simply repeating what it had learned from the users themselves (Vincent, 2016).

The world of AI continues to grow exponentially and with stories like this happening all the time, there's a strong need to increase the public's trust in AI products. To gain their trust, transparency and explain-ability is of the utmost importance.

One of the primary questions for anyone interacting with an AI model like Tay, is: "why did the model make that decision?" Multiple tools have been designed to explain the rationale behind these models and answer that question. It may be to no one's surprise that visual explanations are an efficient way of explaining this. In their work, Ramprasaath, et al. (2017) outline the requirements of a good visual explanation– they must be class discriminative and should have a high-resolution. These criteria serve as guidelines for identifying the challenge to be addressed: creating a solution that provides a high resolution and class discriminative visual explanation for decisions of a neural network.

Some of the techniques that provide visual explanations include deconvolution, guided backpropagation, class activation mapping (CAM), Gradient-weighted CAM (Grad-CAM), Grad-CAM++, Hi-Res-CAM, Score-CAM, Ablation-CAM, X-Grad-CAM, Eigen-CAM, Full-Grad, and deep feature factorization. For this article, we'll focus on Grad-CAM.

Grad-CAM is an open-source tool that produces visual explanations for decisions from a large class of convolutional neural networks. It works by highlighting the regions of the image that have the highest influence on the final prediction of the deep neural network, thereby providing insight into the decision-making process of the model.

Grad-CAM is based on CAM which uses the activation of the feature maps with respect to the target class. It's specific to certain types of neural networks, such as the Visual Geometry Group network and residual network (ResNet). It uses the gradient of the target class with respect to the feature maps in the final layer. Grad-CAM is a generic method that can be applied to different types of neural networks. Combining features makes Grad-CAM a reliable and accurate tool for understanding the decision-making process of deep neural networks. Guided Grad-CAM is enhanced by incorporating the gradients of the guided backpropagation process to produce a more refined heatmap. One limitation is that it's only able to visualize the regions of the image that are most important for the final prediction, rather than the entire decision-making process of the deep neural network. This means that it may not provide a complete understanding of how the model is making its predictions.

The advantages of Grad-CAM include:

  • No trade off of model complexity and performance for more model transparency.
  • It's applicable to a broad range of convolutional neural networks (CNNs).
  • It's highly class discriminative.
  • Useful for diagnosing failure modes by uncovering biases in datasets.
  • Helps untrained users to recognize a stronger network than a weaker one, even when the predictions are identical.

Methodology

Grad-CAM can be used in multiple computer vision projects such as image classification, semantic segmentation, object detection, image captioning, visual question answering, etc. It can be applied on CNNs and has recently been made available on transformer architectures.

Highlighted below is how Grad-CAM works in image classification, where the objective is to discriminate between different classes:

The process flow of Gradient-weighted class activation mapping (Grad-CAM)
Description - Figure 1The process flow of Gradient-weighted class activation mapping (Grad-CAM)

An image is passed through a CNN and a task specific network to obtain a raw score for the image's class. Next, the gradients are set to zero for all classes except for the desired class, which is set to one. This signal is then backpropagated to the rectified convolutional feature maps of interest, which are combined to compute a blue heatmap that represents where the model needs to look to decide on the class. Finally, the heatmap is pointwise multiplied with guided backpropagation, resulting in guided Grad-CAM visualizations that are high-resolution and concept-specific.

In the case of an image classification task, to obtain the Grad-CAM class-discriminative localization map,LGrad-CAMc
,  for a model on a specific class, the steps below are followed:

  • For a specific class, c, the partial derivative of the score, yc , of the class, c, in respect to feature maps, Ak , of a convolutional layer is calculated using backpropagation.
    ycAijk
  • The gradients flowing back due to backpropagation are pooled via global average pooling. This produces a set of scalars of weights. These are the neuron importance weights.
    αkc= 1ZijycAijk
  • The derived scalar weights are applied (linear combination) to the feature map. The result is passed through a Rectified Linear Unit (ReLU) activation function.
    LGrad-CAMc=ReLUkαkcAk
  • The result is scaled and applied to the image, highlighting the focus of the neural network. As seen, a ReLU activation function is applied to the linear combination of maps, because it's only interested in the pixels or features that have a positive influence on the class score, yc .

Demonstration of Grad-CAM

A pair of cats and a pair of remote controls
Description - Figure 2A pair of cats and a pair of remote controls

Image consisting of two Egyptian cats lying down on a pink sofa with remote controls on the left-hand side of each cat.

Figure 2 is an image of two Egyptian cats and two remote controls. The image was derived from the Hugging Face's cat image dataset, using their Python library. The objective is to identify the items within the image using different pretrained deep learning models. A PyTorch package called the PyTorch-GradCAM is used. The Grad-CAM feature identifies aspects of the image that activate the feature map of the Egyptian cat class and the remote-control class. After following the PyTorch-GradCAM tutorial, the Grad-CAM results are replicated for different deep neural networks.

Grad-CAM results of a pretrained Resnet-50 architecture to classify the figure 2 image. This image was generated by applying Grad-CAM to figure 2 in a Jupyter Notebook.
Description - Figure 3Grad-CAM results of a pretrained Resnet-50 architecture to classify the figure 2 image. This image was generated by applying Grad-CAM to figure 2 in a Jupyter Notebook.

Heatmap images generated from a Resnet-50 architecture using Grad-CAM for the Egyptian cat class (left) and Remote-control class (right). The intensity of the red colour shows the regions that contribute the most to the model decision. There are few intense regions for the cat, while the remotes are almost fully captured, but not highly intense.

Figure 2 is parsed through a pretrained residual neural network (Resnet-50) as per the PyTorch-Grad-CAM tutorial. Figure 3 is the image generated using Grad-CAM. For the Egyptian cat class, the leg, stripes, and faces of the cats activated the feature map. For the remote controls, the buttons and profile are what activated the feature map. The top 5k predicted classes in order of logit, are remote control, tiger cat, Egyptian cat, tabby cat, and pillow. This model seems to be more confident the image contains remote controls and cats. Though less confident, the pillow category made the top five of the listed categories. This could be because the model was trained with cat-printed pillows.

Grad-CAM results of a pretrained shifted window transformer to classify figure 2. This image was generated by applying Grad-CAM to figure 2 in a Jupyter Notebook.
Description - Figure 4Grad-CAM results of a pretrained shifted window transformer to classify figure 2. This image was generated by applying Grad-CAM to figure 2 in a Jupyter Notebook.

Heatmap images generated from a shifted window transformer using Grad-CAM for the Egyptian cat class (left) and remote-control class (right). The intensity of the red colour shows the regions that contribute the most to the model's decision. The cats show more intense regions, while the remote controls are almost fully captured with high-intensity.

Like the Resnet-50 architecture, the same image is parsed through a pretrained shifted window transformer. Figure 4 shows the cats' fur, stripes, faces, and legs as activated regions in the feature map in respect to the Egyptian cat category. The same occurs in relation to the feature map in respect to the remote controls. The top 5k predicted classes, in order of logit, are tabby cat, tiger cat, domestic cat, and Egyptian cat. This model is more confident that cats are in this image than remote controls.

Grad-CAM results of a pretrained vision transformer architecture in classifying the image in figure 2 This image was generated by applying Grad-CAM to figure 2 in a Jupyter notebook.
Description - Figure 5Grad-CAM results of a pretrained vision transformer architecture in classifying the image in figure 2 This image was generated by applying Grad-CAM to figure 2 in a Jupyter notebook.

Heatmap images generated from a Vision transformer using Grad-CAM for the Egyptian cat class (left) and remote-control class (right). The intensity of the red colour shows the regions that contribute the most to the model decision. The cats are fully captured in high intensity. The remotes are also captured but not equivalent intensity. In addition, other regions of the images are highlighted despite not being part of either class.

As seen above, more regions of the feature map are activated, including sections of the image that didn't include cat features. The same occurs for regions of the feature map in respect to the remote-control class. The top 5k predicted classes, in order of logit, are Egyptian cat, tiger cat, tabby cat, remote control, and lynx.

The Grad-CAM results with the top 5k categories for different architectures can be used to favour a selection of the vision transformer (VIT) architecture for tasks related to identifying Egyptian cats and remote controls.

Conclusion

Some of the challenges in the field of AI includes increasing the trust of people in the developed models and understanding the rationale behind the decision making of these models during development. Visualizations tools like Grad-CAM provide insight into these rationales and aid in highlighting different failure modes of AI models for specific tasks. It can be used to identify errors in the models and improve their performance. On top of Grad-CAM, there are other visualization tools that have been developed such as Score-CAM, which performs even better in interpreting the decision-making process of deep neural networks. Though Grad-CAM will be selected over Score-CAM due it's simplicity and agnosticism to model architectures. The use of tools such as Grad-CAM, should be encouraged in visually explaining the reason behind the decisions of AI models.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Thursday, June 15
1:00 to 4:00 p.m. ET
MS Teams – link will be provided to the registrants by email

Register for the Data Science Network's Meet the Data Scientist Presentation. We hope to see you there!

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

References

  • S. R. Ramprasaath, C. Michael, D. Abhishek, V. Ramakrishna, P. Devi and B. Dhruv, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization," in ICCV, IEEE Computer Society, 2017, pp. 618-626.
  • Z. Bolei, K. Aditya, L. Agata, O. Aude and T. Antonio, "Learning Deep Features for Discriminative Localization," CoRR, 2015.
  • J. Vincent, "Twitter taught Microsoft's AI chatbot to be racist in less than a day", in The Verge, 2016.
Date modified:

National Travel Survey: C.V.s for Person-Trips by Duration of Trip, Main Trip Purpose and Country or Region of Trip Destination – Q4 2022

National Travel Survey: C.V.s for Person-Trips by Duration of Trip, Main Trip Purpose and Country or Region of Trip Destination – Q4 2022
Table summary
This table displays the results of C.V.s for Person-Trips by Duration of Trip, Main Trip Purpose and Country or Region of Trip Destination. The information is grouped by Duration of trip (appearing as row headers), Main Trip Purpose, Country or Region of Trip Destination (Total, Canada, United States, Overseas) calculated using Person-Trips in Thousands (× 1,000) and C.V. as a units of measure (appearing as column headers).
Duration of Trip Main Trip Purpose Country or Region of Trip Destination
Total Canada United States Overseas
Person-Trips (x 1,000) C.V. Person-Trips (x 1,000) C.V. Person-Trips (x 1,000) C.V. Person-Trips (x 1,000) C.V.
Total Duration Total Main Trip Purpose 67,564 A 60,934 A 5,217 A 1,413 A
Holiday, leisure or recreation 21,112 A 17,744 A 2,494 B 874 A
Visit friends or relatives 28,448 A 26,908 A 1,137 B 403 B
Personal conference, convention or trade show 1,174 C 1,047 C 124 D 3 E
Shopping, non-routine 4,872 B 4,140 B 731 B 2 E
Other personal reasons 5,519 B 5,326 B 157 C 36 D
Business conference, convention or trade show 1,711 B 1,363 B 295 C 53 C
Other business 4,727 B 4,405 B 278 C 44 C
Same-Day Total Main Trip Purpose 43,435 A 41,626 A 1,809 B ..  
Holiday, leisure or recreation 11,991 A 11,400 B 591 C ..  
Visit friends or relatives 17,946 A 17,632 A 314 C ..  
Personal conference, convention or trade show 817 C 781 C 36 E ..  
Shopping, non-routine 4,512 B 3,869 B 643 B ..  
Other personal reasons 4,326 B 4,264 B 62 D ..  
Business conference, convention or trade show 456 C 436 C 21 E ..  
Other business 3,387 B 3,244 B 143 E ..  
Overnight Total Main Trip Purpose 24,129 A 19,308 A 3,408 A 1,413 A
Holiday, leisure or recreation 9,121 A 6,344 A 1,904 A 874 A
Visit friends or relatives 10,502 A 9,276 A 823 B 403 B
Personal conference, convention or trade show 357 C 266 C 88 D 3 E
Shopping, non-routine 360 C 271 C 88 C 2 E
Other personal reasons 1,193 B 1,062 B 95 C 36 D
Business conference, convention or trade show 1,255 B 928 B 275 C 53 C
Other business 1,340 B 1,161 B 135 C 44 C
..
data not available

Estimates contained in this table have been assigned a letter to indicate their coefficient of variation (c.v.) (expressed as a percentage). The letter grades represent the following coefficients of variation:

A
c.v. between or equal to 0.00% and 5.00% and means Excellent
B
c.v. between or equal to 5.01% and 15.00% and means Very good.
C
c.v. between or equal to 15.01% and 25.00% and means Good.
D
c.v. between or equal to 25.01% and 35.00% and means Acceptable.
E
c.v. greater than 35.00% and means Use with caution.

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography – March 2023

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography - March 2023
Table summary
This table displays the results of CVs for Total sales by Geography. The information is grouped by Geography (appearing as row headers). Month and percentage (appearing as column headers).
Geography Month
202203 202204 202205 202206 202207 202208 202209 202210 202211 202212 202301 202302 202303
percentage
Canada 0.87 0.45 0.51 0.66 0.49 0.14 0.13 0.17 0.24 0.88 0.32 0.40 0.29
Newfoundland and Labrador 1.20 1.52 1.66 0.53 0.50 0.47 0.49 0.73 0.49 0.93 2.43 0.89 1.19
Prince Edward Island 9.73 15.01 6.85 15.97 9.23 5.27 3.04 8.45 8.22 3.45 10.49 14.28 2.20
Nova Scotia 0.50 0.98 1.16 1.79 3.37 0.43 0.40 0.37 0.43 16.87 0.83 0.97 0.84
New Brunswick 0.55 1.41 1.26 0.67 0.53 0.52 0.50 0.56 0.73 12.18 1.21 1.95 1.18
Quebec 1.95 0.53 1.73 1.55 0.97 0.18 0.28 0.26 0.19 1.73 0.67 0.96 0.83
Ontario 1.19 0.80 0.74 1.30 0.95 0.25 0.25 0.21 0.53 0.73 0.67 0.85 0.51
Manitoba 0.54 0.80 0.97 0.68 3.49 0.48 0.40 0.37 0.58 9.72 0.78 0.91 1.48
Saskatchewan 1.18 1.84 5.77 6.45 4.85 1.30 0.73 1.31 1.44 7.51 0.62 1.47 1.28
Alberta 2.01 0.68 0.57 1.45 0.91 0.39 0.30 0.33 0.38 1.56 0.40 0.49 0.47
British Columbia 3.25 1.55 0.97 0.64 0.91 0.28 0.21 0.66 0.33 2.77 0.44 0.47 0.50
Yukon Territory 2.20 2.07 23.00 3.32 2.54 2.09 2.07 2.34 2.20 2.50 41.12 3.45 33.49
Northwest Territories 1.77 3.19 29.08 3.20 2.74 2.38 2.05 2.00 2.09 2.56 6.03 2.73 40.91
Nunavut 0.76 0.69 73.56 1.55 1.52 1.30 2.35 2.85 101.77 43.21 2.83 2.40 117.22

Wholesale Trade Survey (monthly): CVs for total sales by geography - March 2023

Wholesale Trade Survey (monthly): CVs for total sales by geography - March 2023
Geography Month
202203 202204 202205 202206 202207 202208 202209 202210 202211 202212 202301 202302 202303
percentage
Canada 0.6 0.8 0.8 0.6 0.7 0.6 0.6 0.6 0.6 0.7 0.7 0.6 0.5
Newfoundland and Labrador 1.5 1.9 0.5 0.3 0.3 0.6 0.5 0.5 0.6 0.5 0.6 0.3 0.3
Prince Edward Island 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nova Scotia 2.5 2.7 3.5 1.6 4.7 2.5 1.9 2.9 1.8 4.9 4.4 2.0 3.8
New Brunswick 1.4 2.9 1.3 1.2 2.1 3.0 1.7 1.3 2.6 2.4 1.8 1.9 1.4
Quebec 1.4 2.5 1.9 1.4 1.5 1.4 1.7 1.4 1.5 2.1 1.6 1.4 1.4
Ontario 1.1 1.2 1.3 1.1 1.1 0.9 1.0 0.9 0.9 1.1 1.1 1.0 1.1
Manitoba 0.6 0.8 1.8 1.7 1.2 1.0 1.5 2.1 1.4 1.8 0.8 0.7 0.5
Saskatchewan 0.4 0.6 0.7 0.7 0.6 1.1 1.2 0.5 0.7 0.4 0.4 0.4 0.6
Alberta 0.8 1.8 1.2 1.2 1.4 1.4 0.8 1.4 1.3 1.1 1.4 0.9 0.4
British Columbia 1.6 1.4 1.6 2.1 1.9 1.6 1.8 2.6 1.5 1.4 1.5 1.8 1.7
Yukon Territory 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Northwest Territories 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nunavut 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

National Travel Survey: C.V.s for Visit-Expenditures by Duration of Visit, Main Trip Purpose and Country or Region of Expenditures – Q4 2022

National Travel Survey: C.V.s for Visit-Expenditures by Duration of Visit, Main Trip Purpose and Country or Region of Expenditures, including expenditures at origin and those for air commercial transportation in Canada, in Thousands of Dollars (x 1,000)
Table summary
This table displays the results of C.V.s for Visit-Expenditures by Duration of Visit, Main Trip Purpose and Country or Region of Expenditures. The information is grouped by Duration of trip (appearing as row headers), Main Trip Purpose, Country or Region of Expenditures (Total, Canada, United States, Overseas) calculated using Visit-Expenditures in Thousands of Dollars (x 1,000) and c.v. as units of measure (appearing as column headers).
Duration of Visit Main Trip Purpose Country or Region of Expenditures
Total Canada United States Overseas
$ '000 C.V. $ '000 C.V. $ '000 C.V. $ '000 C.V.
Total Duration Total Main Trip Purpose 23,128,455 A 14,493,645 A 5,755,989 A 2,878,821 A
Holiday, leisure or recreation 10,772,424 A 5,007,498 A 3,846,923 B 1,918,003 B
Visit friends or relatives 5,909,523 A 4,567,851 A 727,233 B 614,438 B
Personal conference, convention or trade show 351,279 B 242,466 B 104,174 D 4,638 E
Shopping, non-routine 1,104,183 B 903,613 B 198,218 C 2,352 E
Other personal reasons 1,262,310 B 1,051,208 B 100,216 D 110,887 D
Business conference, convention or trade show 1,685,083 B 1,047,344 B 520,784 C 116,955 C
Other business 2,043,654 B 1,673,666 B 258,441 C 111,548 C
Same-Day Total Main Trip Purpose 5,339,135 A 4,974,751 A 352,707 B 11,677 E
Holiday, leisure or recreation 1,710,365 B 1,548,456 B 150,746 C 11,163 E
Visit friends or relatives 1,507,953 B 1,439,415 B 68,538 E ..  
Personal conference, convention or trade show 96,966 C 89,167 C 7,284 E 515 E
Shopping, non-routine 866,286 B 756,192 B 110,094 C ..  
Other personal reasons 604,099 B 599,079 B 5,020 E ..  
Business conference, convention or trade show 73,678 D 67,843 E 5,836 E ..  
Other business 479,788 C 474,599 C 5,189 E ..  
Overnight Total Main Trip Purpose 17,789,320 A 9,518,894 A 5,403,282 A 2,867,144 A
Holiday, leisure or recreation 9,062,058 A 3,459,042 B 3,696,176 B 1,906,840 B
Visit friends or relatives 4,401,570 A 3,128,436 A 658,695 B 614,438 B
Personal conference, convention or trade show 254,313 C 153,299 C 96,890 D 4,123 E
Shopping, non-routine 237,897 C 147,421 C 88,124 D 2,352 E
Other personal reasons 658,211 B 452,129 B 95,196 D 110,887 D
Business conference, convention or trade show 1,611,405 B 979,501 B 514,948 C 116,955 C
Other business 1,563,866 B 1,199,067 B 253,252 D 111,548 C
..
data not available

Estimates contained in this table have been assigned a letter to indicate their coefficient of variation (c.v.) (expressed as a percentage). The letter grades represent the following coefficients of variation:

A
c.v. between or equal to 0.00% and 5.00% and means Excellent.
B
c.v. between or equal to 5.01% and 15.00% and means Very good.
C
c.v. between or equal to 15.01% and 25.00% and means Good.
D
c.v. between or equal to 25.01% and 35.00% and means Acceptable.
E
c.v. greater than 35.00% and means Use with caution.

National Travel Survey Q4 2022: Response Rates

National Travel Survey: Response Rate – Q4 2022
Table summary
This table displays the results of Response Rate. The information is grouped by Province of residence (appearing as row headers), Unweighted and Weighted (appearing as column headers), calculated using percentage unit of measure (appearing as column headers).
Province of residence Unweighted Weighted
Percentage
Newfoundland and Labrador 20.3 17.7
Prince Edward Island 15.2 14.3
Nova Scotia 24.6 21.9
New Brunswick 23.0 19.7
Quebec 28.4 25.0
Ontario 26.1 24.0
Manitoba 27.1 23.8
Saskatchewan 26.6 23.1
Alberta 24.8 23.1
British Columbia 26.7 24.9
Canada 25.6 24.1

Introduction to Privacy-Enhancing Cryptographic Techniques

Zero knowledge proof – Proving something without exchanging evidence

By: Betty Ann Bryanton, Canada Revenue Agency

Introduction

Enormous amounts of data are collected by government agencies, search engines, social networking systems, hospitals, financial institutions, and other organizations. This data, centrally stored, is at risk of security breaches. Additionally, individuals browse the internet, accept cookies, and share personally identifiable information (PII) in exchange for services, benefits, recommendations, etc. To facilitate e-commerce and access services, individuals need to authenticate, which means providing 'evidence' to prove they are who they say they are. This may mean providing a password, a driver's license, a passport number, or another personal identifier. These could potentially be stolen, and sharing this data may compromise related PII, such as age and home address. Zero knowledge proofs can assist in these scenarios.

What is Zero Knowledge Proof?

A Zero-Knowledge Proof (ZKP) is one of the cryptographic privacy-enhancing computational (PEC) techniques and may be used to implement granular, least access privacy controls and privacy-by-designFootnote1 principles.

Typically, a proof that some assertion X is true also reveals some information about why X is true. ZKPs, however, prove that a statement is true without revealing any additional knowledge. It's important to note that ZKPs do not guarantee 100% proof, but they do provide a very high degree of probability.

ZKPs use algorithms that take data as input and return either 'true' or 'false' as output. This allows two parties to verify the truth of information without exposing the information or how the truth was determined. For example, an individual can prove the statement "I am an adult at least 21 years old" without providing data for verification to a central server.

ZKP was introduced by researchers at MIT in 1985Footnote2 and is now being used in many real-world applications.

ZKP vs other concepts

ZKP is distinct from the following concepts:

Further, ZKP should not be confused with Advanced Encryption Standard (AES), where the parties share a secret number. In ZKP, the prover demonstrates their possession of a secret number without divulging that number. In both scenarios the parties arrive at a shared secret, but with ZKP, the goal is to make claims without revealing extraneous information.

How does ZKP work?

To understand how ZKP works, consider the scenario of a prover (Peggy) and a verifier (Victor). The goal of the ZKP is to prove a statement with very high probability without revealing any additional information.

Peggy (the prover) wants to prove to Victor (the verifier, who is colour-blind and does not trust her) that two balls are of different colours (e.g., green and red). Peggy asks Victor to reveal one of the balls, then put the two balls behind his back. Then Peggy asks Victor to switch them or not, then reveal one to her. She answers if it's the same colour or different than the previous one. Of course, she could be guessing or lying, or even colour-blind, herself. Thus, in order to convince him she's telling the truth, this process must be repeated many, many times. By doing so, eventually Peggy can convince Victor of her ability to correctly identify the different colours.

This scenario satisfies the three criteria of a ZKP:

  1. Soundness (the quality of being based on valid reason): If Peggy was not telling the truth, or was colour-blind, she could only guess correctly 50% of the time.
  2. Completeness: After repeating this process ('the proof') many, many times, the probability of Peggy correctly guessing would be very low, convincing Victor that the balls are of different colours.
  3. Zero-knowledge: Victor does not learn anything additional; he never even learns which ball is green and which is red.

What is explained above is interactive proving, requiring a back-and-forth communication between two parties. Today's ZKPs employ non-interactive proving, where two parties have a shared key to transmit and receive information. For example, a government-issued key as part of a passport could be used to demonstrate citizenship without revealing the passport number or the citizen's name.

Why is it important?

ZKPs assure a secure and invisible flow of data, protecting user information from potential leaks and identity theft. This enhances e-commerce, by allowing more private and secure transactions.

The use of ZKPs not only helps combat data security risk, but this minimum viable verification technique helps prevent the disclosure of more PII than necessary. This benefits both individuals and organizations. Individuals do not have to share their PII and organizations that are facing an increase in security breaches, and thus, dealing with significant costs, harm to reputations, and loss of trust, don't receive the PII to be breached.

Another benefit for both individuals and organizations is more efficient verification, reducing bottle-necked processes that rely on manual or inefficient burden of proof.

Having positive and efficient verification between parties (even untrusted ones) opens up a variety of avenues for collaboration and enquiry.

Applications and Use Cases

ZKPs can protect data privacy in a diverse set of applications and use cases, including:

  • Finance: A mortgage or leasing applicant can prove their income falls within a certain range without revealing their salary. (Financial institution ING is already using this technology, according to Dilmegani, 2022.)
  • Online voting: ZKP can enable anonymous and verifiable voting and help prevent voting fraud or manipulation.
  • Machine Learning: A machine learning algorithm owner can convince others about the model's results without revealing any information about the model.
  • Blockchain Security: Transactions can be verified without sharing information such as wallet addresses and amounts with third party systems.
  • Identity and credential management: Identity-free verification could apply to authentication, end-to-end encrypted messaging, digital signatures, or any application requiring passwords, passports, birth certificates, driving licences, or other forms of identity verification. Fraud prevention systems could validate user credentials and PII could be anonymized to comply with regulations or for decentralized identity.
  • International security: ZKPs enable the verification of the origin of a piece of information without revealing its source. This means cyber-attacks can be attributed to a specific entity or nation without revealing how the information was obtained. This is already being used by the United States' Department of Defense  (Zero-knowledge proof: how it works and why it's important, n.d.).
  • Nuclear disarmament: Countries could securely exchange proof of disarmament without requiring physical inspection of classified nuclear facilities.
  • COVID-19 vaccine passports and travel: As currently done in Denmark, individuals could prove their vaccination status without revealing their PII (Shilo, 2022).
  • Auditing or compliance applications: Any process that requires verification of compliance could use ZKP. This could include verifying that taxes are filed, an airplane was maintained, or data is retained by a record keeper.
  • Anonymous payments: Credit card payments could be made without being visible to multiple parties such as payments providers, banks, and government authorities.

Challenges

While there are many benefits, there are also challenges that need to be taken into consideration if an organization wants to use ZKPs.

  • Computation intensity: ZKP algorithms are computationally intense. For interactive ZKPs, many interactions between the verifier and the prover are required, and for non-interactive ZKPs, significant computational capabilities are required. This makes ZKPs unsuitable for slow or mobile devices and may cause scalability issues for large enterprises.
  • Hardware costs: Applications that want to use ZKPs must factor in hardware costs which may increase costs for end-users.
  • Trust assumptions: While, some ZKP public parameters are available for reuse, and participants in the trusted setup are assumed to be honest, recipients must rely on the honesty of the developers (What are zero-knowledge proofs?, 2023).
  • Quantum computing threats: While ZKP cryptographic algorithms are currently secure, the development of quantum computers could eventually break the security model.
  • Costs of using the technology: The costs of ZKPs can vary based on setup requirements, efficiency, interactive requirements, proof succinctness and the hardness assumptions required (Big Data UN Global Working Group, 2019).
  • Lack of standards: Despite ongoing initiatives to standardize zero knowledge techniques and constructions, there is still an absence of standards, systems, and homogeneous languages.Footnote3
  • No 100% guarantee: Though the probability of verification while the prover is lying can be significantly lowFootnote4, ZKPs do not guarantee the claim is 100% valid. 
  • Skills: ZKP developers should have expertise in ZKP cryptography and be aware of the subtleties and differences between the guarantees provided by ZKP algorithms.

What's next?

In recent years there has been a strong push for adopting zero knowledge in software applications. Several organizations have built applications using ZK capabilities, and ZKPs are widely used to safeguard blockchains. For example, the city of Zug in Switzerland has registered all its citizen IDs on a blockchain (Anwar, 2018).

Though there needs to be improvements in ZK education, standardization, and privacy certifications to improve trust in ZK products and services, ZKPs have great potential in saving organizational costs due to security breaches, as well as preserving users' privacy, and reducing PII as a product for sale. ZKPs help an organization move from reacting to security breaches to preventing them.

Meet the Data Scientist

Register for the Data Science Network's Meet the Data Scientist Presentation

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Thursday, June 15
1:00 to 4:00 p.m. ET
MS Teams – link will be provided to the registrants by email

Register for the Data Science Network's Meet the Data Scientist Presentation. We hope to see you there!

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

Related Topics

Authentication, Blockchain, Web 3.0, Privacy-Enhancing Computation (PEC) techniques: Differential Privacy, Homomorphic Encryption, Secure Multiparty Computation, Trusted Execution Environment

References

Date modified:

Why do we conduct this survey?

The purpose of this survey is to collect information for producing national and provincial level estimates of potato production and value. These estimates will be used to assess the economic health of the industry. Agricultural producers and industry analysts will work with this information to make production and marketing decisions, and government analysts will use it to develop agricultural policies in Canada.

Your information may also be used by Statistics Canada for other statistical and research purposes.

Your participation in this survey is required under the authority of the Statistics Act.

Other important information

Authorization to collect this information

Data are collected under the authority of the Statistics Act, Revised Statutes of Canada, 1985, Chapter S-19.

Confidentiality

By law, Statistics Canada is prohibited from releasing any information it collects that could identify any person, business, or organization, unless consent has been given by the respondent, or as permitted by the Statistics Act. Statistics Canada will use the information from this survey for statistical purposes only.

Record linkages

To enhance the data from this survey and to reduce the reporting burden, Statistics Canada may combine the acquired data with information from other surveys or from administrative sources.

Data-sharing agreements

To reduce respondent burden, Statistics Canada has entered into data-sharing agreements with provincial and territorial statistical agencies and other government organizations, which have agreed to keep the data confidential and use them only for statistical purposes. Statistics Canada will only share data from this survey with those organizations that have demonstrated a requirement to use the data.

Section 11 of the Statistics Act provides for the sharing of information with provincial and territorial statistical agencies that meet certain conditions. These agencies must have the legislative authority to collect the same information, on a mandatory basis, and the legislation must provide substantially the same provisions for confidentiality and penalties for disclosure of confidential information as the Statistics Act. Because these agencies have the legal authority to compel businesses to provide the same information, consent is not requested and businesses may not object to the sharing of the data.

For this survey, there are Section 11 agreements with the provincial statistical agencies of Newfoundland and Labrador, Nova Scotia, New Brunswick, Quebec, Ontario, Manitoba, Saskatchewan, Alberta and British Columbia. The shared data will be limited to information pertaining to business establishments located within the jurisdiction of the respective province.

Section 12 of the Statistics Act provides for the sharing of information with federal, provincial or territorial government organizations. Under Section 12, you may refuse to share your information with any of these organizations by writing a letter of objection to the Chief Statistician, specifying the organizations with which you do not want Statistics Canada to share your data and mailing it to the following address:

Chief Statistician of Canada
Statistics Canada
Attention of Director, Enterprise Statistics Division
150 Tunney's Pasture Driveway
Ottawa, Ontario
K1A 0T6

You may also contact us by email at statcan.esd-helpdesk-dse-bureaudedepannage.statcan@canada.ca or by fax at 613-951-6583.
For this survey, there is a Section 12 agreement with the Prince Edward Island Statistical agency.

For agreements with provincial and territorial government organizations, the shared data will be limited to information pertaining to business establishments located within the jurisdiction of the respective province or territory.

Business or organization and contact information

1. Verify or provide the business or organization's legal and operating name and correct where needed.

Note: Legal name modifications should only be done to correct a spelling error or typo.

Legal Name

The legal name is one recognized by law, thus it is the name liable for pursuit or for debts incurred by the business or organization. In the case of a corporation, it is the legal name as fixed by its charter or the statute by which the corporation was created.

Modifications to the legal name should only be done to correct a spelling error or typo.

To indicate a legal name of another legal entity you should instead indicate it in question 3 by selecting 'Not currently operational' and then choosing the applicable reason and providing the legal name of this other entity along with any other requested information.

Operating Name

The operating name is a name the business or organization is commonly known as if different from its legal name. The operating name is synonymous with trade name.

  • Legal name:
  • Operating name (if applicable):

2. Verify or provide the contact information of the designated business or organization contact person for this questionnaire and correct where needed.

Note: The designated contact person is the person who should receive this questionnaire. The designated contact person may not always be the one who actually completes the questionnaire.

  • First name:
  • Last name:
  • Title:
  • Preferred language of communication:
    • English
    • French
  • Mailing address (number and street):
  • City:
  • Province, territory or state:
  • Postal code or ZIP code:
  • Country:
    • Canada
    • United States
  • Email address:
  • Telephone number (including area code):
  • Extension number (if applicable):
    The maximum number of characters is 10.
  • Fax number (including area code):

3. Verify or provide the current operational status of the business or organization identified by the legal and operating name above.

  • Operational
  • Not currently operational
    Why is this business or organization not currently operational?
    • Seasonal operations
      • When did this business or organization close for the season?
        • Date
      • When does this business or organization expect to resume operations?
        • Date
    • Ceased operations
      • When did this business or organization cease operations?
        • Date
      • Why did this business or organization cease operations?
        • Bankruptcy
        • Liquidation
        • Dissolution
        • Other - Specify the other reasons for ceased operations
    • Sold operations
      • When was this business or organization sold?
        • Date
      • What is the legal name of the buyer?
    • Amalgamated with other businesses or organizations
      • When did this business or organization amalgamate?
        • Date
      • What is the legal name of the resulting or continuing business or organization?
      • What are the legal names of the other amalgamated businesses or organizations?
    • Temporarily inactive but will re-open
      • When did this business or organization become temporarily inactive?
        • Date
      • When does this business or organization expect to resume operations?
        • Date
      • Why is this business or organization temporarily inactive?
    • No longer operating due to other reasons
      • When did this business or organization cease operations?
        • Date
      • Why did this business or organization cease operations?

4. Verify or provide the current main activity of the business or organization identified by the legal and operating name above.

Note: The described activity was assigned using the North American Industry Classification System (NAICS).

This question verifies the business or organization's current main activity as classified by the North American Industry Classification System (NAICS). The North American Industry Classification System (NAICS) is an industry classification system developed by the statistical agencies of Canada, Mexico and the United States. Created against the background of the North American Free Trade Agreement, it is designed to provide common definitions of the industrial structure of the three countries and a common statistical framework to facilitate the analysis of the three economies. NAICS is based on supply-side or production-oriented principles, to ensure that industrial data, classified to NAICS , are suitable for the analysis of production-related issues such as industrial performance.

The target entity for which NAICS is designed are businesses and other organizations engaged in the production of goods and services. They include farms, incorporated and unincorporated businesses and government business enterprises. They also include government institutions and agencies engaged in the production of marketed and non-marketed services, as well as organizations such as professional associations and unions and charitable or non-profit organizations and the employees of households.

The associated NAICS should reflect those activities conducted by the business or organizational units targeted by this questionnaire only, as identified in the 'Answering this questionnaire' section and which can be identified by the specified legal and operating name. The main activity is the activity which most defines the targeted business or organization's main purpose or reason for existence. For a business or organization that is for-profit, it is normally the activity that generates the majority of the revenue for the entity.

The NAICS classification contains a limited number of activity classifications; the associated classification might be applicable for this business or organization even if it is not exactly how you would describe this business or organization's main activity.

Please note that any modifications to the main activity through your response to this question might not necessarily be reflected prior to the transmitting of subsequent questionnaires and as a result they may not contain this updated information.

The following is the detailed description including any applicable examples or exclusions for the classification currently associated with this business or organization.

Description and examples

  • This is the current main activity
  • This is not the current main activity
    Provide a brief but precise description of this business or organization's main activity:
    • e.g., breakfast cereal manufacturing, shoe store, software development

Main activity

5. You indicated that is not the current main activity.

Was this business or organization's main activity ever classified as: ?

  • Yes
    When did the main activity change?
    Date:
  • No

6. Search and select the industry classification code that best corresponds to this business or organization's main activity.

Select this business or organization's activity sector (optional)

  • Farming or logging operation
  • Construction company or general contractor
  • Manufacturer
  • Wholesaler
  • Retailer
  • Provider of passenger or freight transportation
  • Provider of investment, savings or insurance products
  • Real estate agency, real estate brokerage or leasing company
  • Provider of professional, scientific or technical services
  • Provider of health care or social services
  • Restaurant, bar, hotel, motel or other lodging establishment
  • Other sector

7. You have indicated that the current main activity of this business or organization is: Main activity. Are there any other activities that contribute significantly (at least 10%) to this business or organization's revenue?

  • Yes, there are other activities
    Provide a brief but precise description of this business or organization's secondary activity:
    e.g., breakfast cereal manufacturing, shoe store, software development
  • No, that is the only significant activity

8. Approximately what percentage of this business or organization's revenue is generated by each of the following activities?

When precise figures are not available, provide your best estimates.

CAPTION
  Percentage of revenue
Main activity  
Secondary activity  
All other activities  
Total percentage  

Potatoes grown for sale this year

1. Are you growing any potatoes for sale this year?

Please report all planting intentions, if you have not completed your planting activities when completing this survey.

  • Yes
  • No

Area planted

2. What is the total area of potatoes planted in the 2023 crop year?

Please report for the entire operation. Report the area of potatoes planted on land owned or rented by all partners in the operation.

Please report all planting intentions, if you have not completed your planting activities when completing this survey.

Total area:

Unit of measure:

  • Acres
  • Hectares

Agricultural production

3. Which of the following agricultural products are currently being produced on this operation?

Select all that apply.

  • Field crops
  • Hay
  • Summerfallow
  • Potatoes
  • Fruit, berries and nuts
  • Vegetables
  • Sod
  • Nursery products
  • Greenhouse products
  • Cattle and calves
    Include beef or dairy.
  • Pigs
  • Sheep and lambs
  • Mink
  • Fox
  • Hens and chickens
  • Turkeys
  • Maple taps
  • Honey bees
  • Mushrooms
  • Other
    Specify agricultural products:
  • Not producing agricultural products

Area in crops

4. What area of this operation is used for the following crops?

Report the areas only once, even if used for more than one crop type.

Exclude land used by others.

CAPTION
  Area Unit of measure
a. Field crops    
b. Hay    
c. Summerfallow    
d. Potatoes    
e. Fruit, berries and nuts    
f. Vegetables    
g. Sod    
h. Nursery products    
Unit of measure
  • acres
  • hectares
  • arpents

Greenhouse area

5. What is the total area under glass, plastic or other protection used for growing plants?

Total area:

Unit of measure:

  • square feet
  • square metres

Livestock (excluding birds)

6. How many of the following animals are on this operation?

Report all animals on this operation, regardless of ownership, including those that are boarded, custom-fed or fed under contract.

Include all animals kept by this operation, regardless of ownership, that are pastured on a community pasture, grazing co-op or public land.

Exclude animals owned but kept on a farm, ranch or feedlot operated by someone else.

CAPTION
  Number
a. Cattle and calves  
b. Pigs  
c. Sheep and lambs  
d. Mink  
e. Fox  

Birds

7. How many of the following birds are on this operation?

Report all poultry on this operation, regardless of ownership, including those grown under contract.

Include poultry for sale and poultry for personal use.

Exclude poultry owned but kept on an operation operated by someone else.

CAPTION
  Number
a. Hens and chickens  
b. Turkeys  

Maple taps

8. What was the total number of taps made on maple trees last spring?

Total number of taps:

Honey bees

9. How many live colonies of honey bees (used for honey production or pollination) are owned by this operation?

Include bees owned, regardless of location.

Number of colonies:

Mushrooms

10. What is the total mushroom growing area (standing footage) on this operation?

Include mushrooms grown using beds, trays, tunnels or logs.

Total area:

Unit of measure:

  • square feet
  • square metres

Changes or events

1. Indicate any changes or events that affected the reported values for this business or organization, compared with the last reporting period.

Select all that apply.

  • Strike or lock-out
  • Exchange rate impact
  • Price changes in goods or services sold
  • Contracting out
  • Organizational change
  • Price changes in labour or raw materials
  • Natural disaster
  • Recession
  • Change in product line
  • Sold business or business units
  • Expansion
  • New or lost contract
  • Plant closures
  • Acquisition of business or business units
  • Other
    Specify the other changes or events:
  • No changes or events

Contact person

1. Statistics Canada may need to contact the person who completed this questionnaire for further information. Is [Provided Given Names], [Provided Family Name] the best person to contact?

  • Yes
  • No

Who is the best person to contact about this questionnaire?

  • First name:
  • Last name:
  • Title:
  • Email address:
  • Telephone number (including area code):
  • Extension number (if applicable):
    The maximum number of characters is 5.
  • Fax number (including area code):

Feedback

1. How long did it take to complete this questionnaire?

Include the time spent gathering the necessary information.

  • Hours:
  • Minutes:

2. Do you have any comments about this questionnaire?

Ottawa to hold World Statistics Congress in July 2023

By: Bridget Duquette, Statistics Canada

This summer, Ottawa will be the backdrop for the 64th World Statistics Congress (WSC), hosted by the International Statistical Institute (ISI) from July 16 to 20 at the Shaw Centre. This event will feature a variety of panels, presentations and social events, as well as networking and recruitment opportunities. It will offer a great opportunity for knowledge sharing and collaboration between data scientists, statisticians and methodologists at the international level.

The WSC has been held every two years since 1887 and is attended by statisticians, academics and business leaders. This event helps to shape the landscape of statistics and data science worldwide. Canada has hosted this prestigious event only once before, in 1963, in Ottawa.

It’s traditional for the host country of the WSC to plan social events for attendees. This year, international guests will be offered a tour of local sites in Ottawa’s downtown core, guided by Statistic Canada’s Eric Rancourt, Assistant Chief Statistician, and Claude Girard, Senior Methodologist.

A sneak peek of the event’s congress programme is available and includes information on presentations dozens of topics of interest to data scientists.  This year, the illustrious keynote speaker will be former Director of the United States Census Bureau, Professor Robert M. Groves.

Ottawa’s Shaw Centre

Figure 1: Ottawa’s Shaw Centre.

Kenza Sallier, Senior Methodologist at StatCan and co-author of the recently published article entitled Unlocking the power of data synthesis with the starter guide on synthetic data for official statistics, is looking forward to participating once again—though this will be her first time attending in person.

“I attended the 2021 WSC, in the middle of the pandemic (and census collection),” Kenza says. “I had the great opportunity to present Statistics Canada’s achievements related to data synthesis and to also be invited to take part in a panel session to share my experience as a young female statistician in the world of official statistics. Even though it was virtual, the event supported meeting and networking with many interesting people. I am looking forward to attending the 2023 WSC as it is taking place in person. My colleague Craig Hilborn and I will be presenting our work and I hope to get feedback from our peers.”

Shirin Roshanafshar, Chief of Text Analytics and Digitalization at Statistics Canada will also be attending the conference and speaking at the session about the challenges of Natural Language Processing techniques in official statistics.

For all participants—whether they are attending for the first time or the fifth time—WSC 2023 is sure to be an exciting experience. In the words of ISI President Stephen Penneck: “The congress encourages collaboration, growth, discovery, and advancement in the field of data science. I am excited to have the 64th World Statistics Congress visit Canada and look forward to the impact it will have on the industry.”

Check back for a review of the conference and the exciting developments from this global event.

Date modified:

Supplement to Statistics Canada's Generic Privacy Impact Assessment related to the Survey of Early Learning and Child Care Arrangements – Children with Long-term Conditions or Disabilities

Date: March 2023

Program manager: Director, Diversity and Sociocultural Statistics
Director General, Justice, Diversity and Population Statistics

Reference to Personal Information Bank (PIB)

Personal information collected through the Survey of Early Learning and Child Care Arrangements – Children with Long-term Conditions or Disabilities (SELCCA – CLCD) is described in Statistics Canada's "Special Surveys" Personal Information Bank. The Personal Information Bank refers to information collected through Statistics Canada's ad hoc surveys, which are not part of the regular survey taking activities of the Agency. They cover a variety of socio-economic topics including health, housing, labour market, education and literacy, as well as demographic data.

The "Special Surveys" Personal Information Bank (Bank number: StatCan PPU 016) is published on the Statistics Canada website under the latest Information about Programs and Information Holdings chapter.

Description of statistical activity

Statistics Canada is conducting the Survey of Early Learning and Child Care Arrangements – Children with Long-term Conditions or Disabilities (SELCCA – CLCD) under the authority of the Statistics Act1. on a cost-recovery basis for Employment and Social Development Canada (ESDC) to address the data gaps for children with long-term conditions and disabilities. This new cross-sectional survey aims to gather information from parents and guardians of children with one or more long-term condition or disability, aged 0 to 5, living in the provinces, as a next step to Statistics Canada's other recent collections on Early Learning and Child Care (the voluntary 2022 Survey of Early Learning and Child Care Arrangements (SELCCA) and 2023 Canadian Survey on Early Learning and Child Care (CSELCC) . The SELCCA – CLCD includes the same core content as each of these surveys to support indicators related to early learning and child care, as well as content that can provide information on the specific needs of these children and their families or barriers that they may experience. CSELCC, which replaces SELCCA for 2023, includes additional content on parents' and guardian's labour market participation to better understand the interaction between work and the use of early learning and child care arrangements. As there is no standardized measure of disability among children, neither of these surveys included measures to identify whether children have long-term conditions or disabilities.

The survey asks parents and guardians about their child care preferences, arrangements, associated costs and other relevant information such as difficulties they may have faced when looking for or accessing care (accessibility and availability).

In order to identify long-term conditions or disabilities in children, the survey also collects information on their activity limitations, physical and mental health conditions and health status. In addition, age, gender, Indigenous identity, visible minority status, and education will be collected.

Results from this survey will be used to help to support the Multilateral Early Learning and Child Care Framework, which seeks to improve Canada's Early Learning and Child Care (ELCC) system.

To reduce respondent burden and supplement/verify relevant information (described in more detail below), participants will be notified that their responses will be linked to their data from Statistics Canada's 2021 Census of Population2 and Longitudinal Immigration Database3, as well as their Canada Child Benefit, T1 Universe File (Personal Master)4 and the T1 Family File (T1FF) data from the Canada Revenue Agency (CRA)5 . Statistics Canada's microdata linkage and related statistical activities were assessed in Statistics Canada's Generic Privacy Impact Assessment.6 All data linkage activities are subject to established governance7 , and are assessed against the privacy principles of necessity and proportionality8 . All approved linkages are published on Statistics Canada's website9 .

Reason for supplement:

While the Generic Privacy Impact Assessment (PIA) addresses most of the privacy and security risks related to statistical activities conducted by Statistics Canada, this supplement was developed to address the collection and use of potentially sensitive information regarding disabilities and long-term conditions of children10 . As is the case with all PIAs, Statistics Canada's privacy framework ensures that elements of privacy protection and privacy controls are documented and applied.

Reason for supplement

While the Generic Privacy Impact Assessment (PIA) presents and addresses most of the privacy principles and security risks related to statistical activities conducted by Statistics Canada, this supplement addresses any privacy risks associated with this new data environment. As is the case with all PIAs, Statistics Canada's privacy framework ensures that elements of privacy protection and privacy controls are documented and applied.

Necessity and Proportionality

The collection and use of personal information for SELCCA – CLCD can be justified against Statistics Canada's Necessity and Proportionality Framework:

1. Necessity

The collection and use of information on children with long-term conditions and disabilities is required by Employment and Social Development Canada (ESDC) to ensure that Canada's Early Learning and Child Care (ELCC) system meets the needs of all families, including those who may face barriers to child care as a result of long-term conditions or disabilities as per the Government of Canada's Multilateral Early Learning and Child Care Framework. Other surveys on early learning and child care do not collect information related to long-term conditions or disabilities in children nor the barriers they face. As a result, there is a data gap related to the children with long-term conditions and disabilities and child care, which results in the inability to assess whether ELCC in Canada adequately supports these children and their families. The information produced through this survey will be used to direct policies and programs aiming to improve the quality, accessibility, affordability, inclusivity and flexibility of ELCC programs and services.

The 2021 Census and the CRA's Canada Child Benefit file will be used by Statistics Canada methodologists to identify which households are in-scope for collection when creating the survey frame, based on statistical sampling parameters (geographic representativeness, socio-demographic representativeness, etc.) and the indicated presence of young children.

Respondents will be informed that their survey responses will be linked to 2021 Census of Population data, immigration data (from Statistics Canada's Longitudinal Immigration Database), and select CRA data to provide additional contextual information, and to help reduce respondent burden. Specifically:

  • The linkage with the 2021 Census of Population will be used to evaluate the relationship between the Activities of Daily Living questions on the Census with the reporting of long-term conditions and disabilities in children on the SELCCA – CLCD to determine how accurate it is at identifying children with long-term conditions and disabilities. This analysis will support the future development of a social model disability screening tool for children, similar to the Disability Screening Questions. A standardized disability screening tool for children would facilitate the collection of data for this population, resulting in data which reflects the lived experiences of these children and their parents. Additionally, a standardized disability screening tool would support Canada in meeting obligations under the United Nations Convention on the Rights of Persons with Disabilities (UNCRPD).
  • A linkage with the CRA's Canada Child Benefit will provide accurate geographic information, such as place of residence.
  • Data from CRA, such as the T1 Universe File (Personal Master) file, and T1FF, will provide data related to personal and household income to reduce the number of questions respondents need to answer.
  • Finally, a linkage to the Longitudinal Immigration Database provides data related to immigration status for the child and their family to reduce the number of questions respondents need to answer.

2. Effectiveness - Working assumptions:

SELCCA – CLCD is designed to provide accurate estimates of the experiences and needs of children with long-term conditions and disabilities and their families at the national level, excluding territories11 . To effectively collect this data, Statistics Canada will:

3. Proportionality

SELCCA – CLCD was developed to address data gaps related to child care for children with disabilities or long-term conditions, specifically barriers related to the use of child care. Questions related to long-term conditions and disability, particularly in children, can be viewed as sensitive, however this information is required to ensure that the early learning and child care arrangement needs of all children are being met.

In order to ensure the survey was reflective of the needs of children with long-term conditions and disabilities, content for SELCCA – CLCD was developed in consultation with experts in early learning and child care for children with disabilities, as well as ESDC, to ensure that the content aligned with the needs of researchers and ESDC to make informed policy decisions that ultimately benefit the Canadian society and economy. Only content needed to support ELCC policies and programs are included within the survey.

A sample size of 20,000 children aged 0-5 has been assessed as necessary by methodologists to produce statistics of sufficient quality to produce insightful information at the national level. The Census of Population and the Canada Child Benefit are used to create a representative sample frame of children for the survey, with a person knowledgeable about the child, such as a parent or legal guardian, responding to the survey.

In order to limit collection to only children aged 0-5 with a long-term condition or disability, a screening at the beginning of the survey ensures that respondent children without a long-term condition or disability are screened out and not required to provide any additional information as they are considered out of scope.

Ultimately, this collection of data and information is considered proportional to the potential benefits to policymakers, who will be able to use it to help address the child care needs of children with a long-term condition or disability.

4. Alternatives

Statistics Canada conducts several other surveys related to early learning and child care13 ; however, these surveys do not collect data on long-term conditions or disabilities. Specifically, these surveys do not contain:

  • measures to identify long-term conditions or disabilities in children, or
  • content regarding the specific needs of these children and their families, including information on barriers related to child care.

Adding content on long-term conditions and disabilities in children to existing surveys was considered, but was projected to increase response burden for participants selected for these other surveys to statistically and operationally unacceptable levels. Although the data from these other surveys supports Canada's ELCC system, they do not provide sufficient insights related to the long-term conditions or disabilities of the associated children.

Mitigation factors

Some questions contained in SELCCA – CLCD are considered sensitive as they relate to health conditions or disabilities of young children, however, the overall risk of harm to the survey respondents has been deemed manageable with existing Statistics Canada safeguards that are described in Statistics Canada's Generic Privacy Impact Assessment, as well as with the following measure:

  • Qualitative testing of survey content was conducted with parents and guardians of children with long-term conditions and disabilities to ensure that the content would be well understood and to evaluate whether the questions being asked were overly intrusive or insensitive. As a result of feedback related to the sensitivity of the content received during qualitative testing, several changes were made to the final survey content, including the removal of content that was deemed highly sensitive.
  • As with all Statistic Canada surveys, prior to the survey, respondents will be informed of the survey purpose, allowing them to decide if they want to participate. This information will be provided through invitation and reminder letters and reiterated at the beginning of the online questionnaire. Respondents will also be informed that their participation is voluntary before being asked any questions. Information about the survey, a brochure, and the survey questionnaire will be made available on Statistics Canada's website on the day collection starts.
  • Individual responses will be grouped with those of others when reporting results. Individual responses and results for very small groups will never be published or shared with government departments or agencies. Careful analysis of the data and consideration will be given prior to the release of aggregate data to ensure that vulnerable individuals are not disproportionally impacted.

Conclusion

This assessment concludes that, with the existing Statistics Canada safeguards including those listed above, any remaining risks are such that Statistics Canada is prepared to accept and manage the risk.