Sign up to My StatCan to get updates in real-time.
Welcome to the Accessibility Statistics Hub, a collaborative initiative to share data on accessibility with the Canadian public. Data are organized according to the 7 priority areas as set out in the Accessible Canada Act.
Performance Indicator Framework for Accessibility Data
The Performance Indicator Framework (PIF) for Accessibility Data represents the Government of Canada's approach to measuring progress in the identification, removal, and prevention of barriers to accessibility over time. The PIF for Accessibility Data is guided by the Federal Data and Measurement Strategy. To date, performance indicators to measure the identification and removal of barriers to accessible employment, information and communication technologies, and transportation have been developed.
Employment
For measurement purposes, accessible employment means a barrier-free experience for people with disabilities in all phases of the employment journey from hiring, to progressing in their career, all the way to retirement.
Labour market opportunities
Labour market opportunities
Indicator
Percent
Data source
Employment rate for persons with disabilities aged 25 to 64 years, 2022
For measurement purposes, accessible ICT means barrier-free access to information and communication technologies like smart phones, computers, television, assistive devices and so on for people with disabilities.
ICT assistive aids, devices, and technologies
ICT assistive aids, devices, and technologies
Indicator
Number of persons
Percent
Data source
Persons with disabilities who have required ICT assistive aids, devices, and technologies, 2022
Explore a data visualization tool on disability rates by age and sex for provinces and territories based on data from the 2017 Canadian Survey on Disability.
Annual Non-store Retail Survey: CVs for operating revenue - 2019
Table summary
This table displays the results of CVs for operating revenue for 2019. The information is grouped by Geography (appearing as row headers), CVs for operating revenue (appearing as a column header) calculated using percentage units of measure.
The Tuition and Living Accommodation Costs (TLAC) survey collects data for full-time students in degree programs at Canadian public postsecondary institutions. The survey was developed to provide an overview of tuition and additional compulsory fees, and living accommodation costs that students can expect to pay for an academic year.
TLAC survey data:
provides stakeholders, the public and students with annual tuition costs and changes in tuition fees from the previous year
contributes to a better understanding of the costs to obtain a degree
contributes to education policy development
contributes to the Consumer Price Index
facilitates interprovincial comparisons
facilitates comparisons between institutions
B. Reference period
2021/2022 academic year (September to April)
C. Population
The target population is all publicly funded degree-granting institutions (universities and colleges) in Canada.
The survey target population includes institutions that have degree-granting status for the academic year 2021/2022. Institutions that do not have degree-granting status are excluded even if they provide portions of programs that lead to a degree granted by another institution. The survey is limited to institutions whose operations are primarily funded by provincial governments. Institutions that do not receive grants from Education ministries or departments, and institutions that receive grants only from Health ministries and departments are excluded.
D. Fields of study
The field of study classification for both undergraduate and graduate programs are adapted from the 2016 Classification of Instructional Programs (CIP), Statistics Canada's standard for field of study classification. The CIP's structure comprises several groupings developed jointly by Statistics Canada and the National Center for Education Statistics (NCES) in the USA. It is based on work undertaken as part of the creation of the North American Product Classification System (NAPCS) by Canada, the United States and Mexico.
TLAC CIP groupings for Undergraduate programs:
Education
Visual and Performing Arts, and Communications Technologies
Humanities
Social and Behavioural Sciences, and Legal Studies
Law
Business, Management and Public Administration
Physical and Life Sciences and Technologies
Mathematics, Computer and Information Sciences
Engineering
Architecture
Agriculture, Natural Resources and Conservation
Dentistry
Medicine
Nursing
Pharmacy
Veterinary Medicine
Optometry
Other Health, Parks, Recreation and Fitness
Personal, Protective and Transportation Services
Other
TLAC CIP groupings for Graduate programs:
Includes all of the undergraduate program groupings with the exception of Medicine and the addition of:
Executive MBA
Regular MBA
Refer to Appendix A: CIP
Note: Dental, Medical and Veterinary Residency Programs offered in teaching hospitals and similar locations that may lead to advanced professional certification are excluded.
E. Submission Date
The completed questionnaire must be returned by June 11, 2021 by uploading the file back in the Secure Internet Site (E-File transfer Service).
Tuition fee tables disseminated by Statistics Canada are based on an academic year for full-time students with a full course load in degree programs, regardless of the number of credits.
Tuition should be reported based on the academic year (8 months, September to April) or semester (4 months) regardless of the number of credits. If it is not possible to provide tuition data for a semester or academic year, tuition should be reported per credit.
Final fees should be reported. If they have not yet been determined, report an estimate and check the box on the questionnaire to state that these are estimated fees for 2021/2022.
Part A: Tuition fees for full-time students
How to Report Tuition Fees:
Report tuition fees for full-time students in degree programs only. The degree must be conferred by your institution, which means that students start and complete their degree at your institution. DO NOT include associate degrees, diplomas and certificates.
Verify and update the previous year data (2020/2021) on each page if required.
Report fees with decimals, NO commas. Example 2415.45.
Quebec, Nova-Scotia and Newfoundland and Labrador: Lower fees represent Canadian students that have a permanent address in the province (in-province students) and the Upper fees represent Canadian students with an out-of-province permanent address.
Academic year (8 months, September to April): When tuition is reported based on the academic year, report the full cost of the program regardless of the number of credits.
Semester (4 months): When tuition is reported based on semester, report the full cost of the semester regardless of the number of credits. Semester fees will be multiplied by two to calculate tuition for the academic year (8 months).
Per Credit: Only report per credit if you cannot report based on semester or academic year regardless of the number of credits. We assume 30 credits as the minimum number of credits to calculate academic year fees. Therefore, when reporting based on per credit, tuition will be multiplied by 30 credits.
Report additional compulsory fees for materials or equipment on pages 4 (undergraduate) and 5 (graduate).
NEW degree programs must be specified in the Comments section at the bottom of page 2 (undergraduate) and page 3 (graduate).
Undergraduate Law page 2, only professional designations for Law (LLB, JD, BCL), from a Faculty of Law should be reported in this grouping.
Graduate Law page 3, only professional Law degrees from a Faculty of Law (post-LLB/JD), should be reported in this grouping.
Tuition for legal studies degree programs (non-professional Law degrees) on page 2 and page 3, should be reported under "Social and Behavioural Sciences, and Legal Studies". See Appendix A.
Only Medicine (MD, doctor of medicine) program should be reported under undergraduate Medicine, page 2 of the questionnaire. See appendix A.
Personal, Protective and Transportation Services includes:
43.0103 Criminal justice / law enforcement administration
43.0104 Criminal justice / safety studies
43.0106 Forensic science and technology
43.0107 Criminal justice / police science
Part B: Additional Compulsory fees for full-time Canadian Students
How to Report Additional Compulsory Fees:
In part B of the questionnaire, report additional compulsory fees for full-time Canadian students in the first row of the table where these fees do not vary according to their field of study for all full-time undergraduate students (page 4) and graduate students (page 5).
Important note: Health Plan and Dental Plan fees that students can opt out of with proof of comparable coverage should not be included. However, this information should be noted in the comments section of the questionnaire.
Part C: Living Accommodation costs at residences/housing
Accommodation costs should be reported wherever possible for full-time students living in residence. If it is not possible to separate the room and the meal plan costs for single students only a total should be reported.
III. Definitions
Tuition Fees
Tuition that is charged to a full-time student with a full course load, regardless of the number of credits.
Additional Compulsory fees
Additional compulsory fees collected by the TLAC survey are those that all students must pay regardless of the field of study (TLAC grouping).
These fees cover services that vary from institution to institution, year to year, faculty to faculty, or school to school within the same institution.
Additional compulsory fees may include: general fees (admission, registration, examination, internship, etc.), technology fees, student services fees, student association fees, contributions to student activities, copyright fees, premiums for compulsory insurance plans, fees for athletics and recreational facilities/activities, and other fees such as transcript, degree, laboratory, uniform, u-pass, etc.
TLAC Additional Compulsory Fee Breakdown
Athletics fees
Mandatory fees that support intercollegiate athletics, they cover athletics facilities and campus recreational activities (intramurals, fitness and recreation courses, etc.)
Health Services fees
Mandatory fees support the on-campus clinic facilities providing services of doctors and nurses. Health and dental plan fees: if students can opt out of these plans with proof of comparable coverage, these fees should be excluded from the survey.
Student Association fee
Mandatory fees support the general operating expenses of the association.
Other fees
If compulsory fees are reported in "Other please specify" you must provide further details on the types of fees reported. For example, u-pass, transcript, laboratory, technology fee, etc.
IV. Suggestions
Statistics Canada would welcome any suggestions for changes in the survey which you may wish to propose.
Accommodation Services : CVs for operating revenue - 2019
Table Summary
This table displays the results of CVs for operating revenue for 2019. The information is grouped by Geography (appearing as row headers), CVs for operating revenue (appearing as a column header) calculated using percentage units of measure.
Management, scientific and technical consulting services: CVs for operating revenue - 2019
Table summary
This table displays the results of CVs for operating revenue for 2019. The information is grouped by Geography (appearing as row headers), CVs for operating revenue (appearing as a column header) calculated using percentage units of measure.
Geography
CVs for operating revenue
percent
Canada
0.01
Newfoundland and Labrador
0.02
Prince Edward Island
0.02
Nova Scotia
0.02
New Brunswick
0.03
Quebec
0.02
Ontario
0.02
Manitoba
0.05
Saskatchewan
0.03
Alberta
0.02
British Columbia
0.03
Yukon
0.01
Northwest Territories
0.00
Nunavut
0.00
2021 Census Comment Classification
By: Joanne Yoon, Statistics Canada
Once every five years, the Census of Population provides a detailed and comprehensive statistical portrait of Canada and its population. The census is the only data source that provides consistent statistics for both small geographic areas and small population groups across Canada. Census information is central to planning at all levels. Whether starting a business, monitoring a government program, planning transportation needs or choosing the location for a school, Canadians use census data every day to inform their decisions.
Preparation for each cycle of the census requires several stages of engagement, as well as testing and evaluating data to recommend questionnaire content for the next census, as is the case for the upcoming 2021 Census. These steps include content consultations and discussions with stakeholders and census data users, as well as the execution of the 2019 Census Test (which validates respondent behaviours and ensures that questions and census materials are understood by all participants).
At the end of the Census of Population questionnaires, respondents are provided with a text box in which they can share concerns and suggestions, or make comments about the steps to follow, the content or the characteristics of the questionnaire. The information entered in this space is further analyzed by the Census Subject Matter Secretariat (CSMS) during and after the census collection period. Comments pertaining to specific questionnaire content are classified by subject matter area (SMA)—such as education, labour or demography—and shared with the corresponding expert analysts. The information is used to support decision making regarding content determination for the next census and to monitor factors such as respondent burden.
Using machine learning to classify comments
In an effort to improve the analysis of the 2021 Census of Population comments, Statistics Canada's Data Science Division (DScD) worked in collaboration with CSMS to create a proof of concept on the use of machine learning (ML) techniques to quickly and objectively classify census comments. As part of the project, CSMS identified fifteen possible comment classes and provided previous census comments labelled with one or more of these classes. These fifteen classes included the census SMAs as well as other general census themes by which to classify comments from respondents such as "experience with the electronic questionnaire," "burden of response," as well as "positive census experience" and comments "unrelated to the census." Using ML techniques along with the labelled data, a bilingual semi-supervised text classifier was trained wherein comments can be in either French or English and the machine can use labelled data to learn each class, while leveraging unlabelled data to understand its data space. DScD data scientists experimented with two ML models—the strengths of each model, along with the final model are detailed in this article.
The data scientists trained the 2021 Census comment classifier using comments from the 2019 Census Test. The CSMS team manually labelled these comments using the fifteen identified comment classes and reviewed each other's coding in an effort to reduce coding biases. The classifier is multi-class since a comment can be classified into fifteen different classes. As a result, the classifier is also multi-label since a respondent can address multiple topics within a single comment falling under multiple classes, and so the comment can be coded to one or more class.
Deterministic question and page number mapping
When a comment contains a question or page number, that number is deterministically mapped to the SMA class associated to the question and then combined with the ML class prediction in order to output the final class prediction. For example, say that a respondent completes a questionnaire where question number 22 asks about the respondent's education. In the comment box, the respondent comments on question 22 by explicitly stating the question number and also mentions the sex and gender questions without stating any question numbers. The mapping outputs the education class and the ML model predicts the sex and gender class based on the words used to mention the sex and gender questions. The program outputs the final prediction which is a union of the two outputs: education and sex and gender class. When no question number or page is explicitly mentioned, the program only outputs the ML prediction. The ML model is not trained to learn the page number mapping of each question since the location of a question can change depending on the questionnaire format. There are, for example, questions on different pages when you compare the regular font and the large print questionnaires as fewer questions fit per page in large print, and the electronic or online questionnaire does not show any page numbers.
Text cleaning
Before training the classifier, the program first cleans the comments. It identifies the language of the comment (English or French) and then corrects the spelling of unidentifiable words with a word that requires the least amount of edits and is most frequently found in the training data. For example, the word toqn can be corrected to the valid words torn or town, but is corrected to town because town was used more frequently in the training data. Also, the words are lemmatized into their root representation. The machine thus understands the words walk and walked to have the same root meaning. Stop words are not removed since helper words have meaning and imply sentiment. For example, this should be better has a different meaning from this is better, but if the program dropped all stop words (including this, should, be and is), the two sentences becomes identical with only one word left: better. Removing stop words can alter the meaning and the sentiment of a comment.
Bilingual semi-supervised text classifier
The bilingual semi-supervised text classifier learns from the labelled comments and is used to classify comments. Bilingual semi-supervised text classifier is not a single concept but rather individual pieces combined to best classify census comments.
The data scientists have trained a bilingual model where the proportion of French to English labelled comments as detected by a language detecting python program was 29% and 71%, respectively (16,062 English labelled comments and 6,597 French labelled comments). By training the model on both languages, it leveraged identical words (such as consultation, journal and restaurant) that have the same meaning in both languages to improve the accuracy of French comments which have less labels than English comments.
The model is semi-supervised. Labelled data define the knowledge that the machine needs to replicate. When given the labelled training data, the model uses maximum likelihood to learn the model's parameters and adversarial training to be robust to small perturbations. Unlabelled data are also used to expand the data space that the machine should handle with low confusion but does not teach the model about the meaning of classes. The unlabelled data are only used to lower the model's confusion using entropy minimization to minimize the conditional entropy of estimated class probabilities and virtual adversarial training to maximize the local smoothness of a conditional label distribution against local perturbation.
The text classifier starts with an embedding layer to accept words as input. A lookup table will map each word to a dense vector since the machine learns from numbers and not characters. The embedding layer will represent a sequence of words into a sequence of vectors. With this sequence, the model looks for a pattern that is more generalizable and robust than learning individual words. Also, to prevent the machine from memorizing certain expressions rather than semantic meaning, a dropout layer directly follows the embedding layer. When training, the dropout layer drops random words from the training sentence. The proportion of words dropped is fixed but the dropped words are selected at random. The model is forced to learn without some words so that it generalizes better. When using the model to classify comments, no words are dropped and the model can use all identified knowledge and patterns to make a prediction.
Comparing CNN to Bi-LSTM
The data scientists compared a convolutional neural network (CNN) to a Bi-directional-Long Short Term Memory (Bi-LSTM) network. Both networks can classify text by automatically learning complex patterns, but learn differently because of their different structures. In this proof of concept, the data scientists experimented with three different models to learn all fifteen classes: a single-headed LSTM model, a multi-headed LSTM model and a multi-headed CNN model. Overall, the single-headed LSTM model consistently predicted all the classes the most accurately and will thus be used in production.
LSTM can capture long-term dependencies between word sequences using input, forget and output gates as it can learn to retain or forget previous state's information. Previous state's information is the context made by the group of words that preceded the current word that the network is looking at. If the current word is an adjective, the network knows what the adjective is referring to because it retained that information earlier in the sentence. If the sentence talks about a different topic, the network should forget the previous state of information. Since Bi-LSTM is bi-directional, the model gathers past and future information relative to each word.
The CNN model applies a convolution filter to a sliding window of group of words and max pooling to select the most prominent information from a phrase of words rather than looking at each word independently. CNN defines the semantic context of a word using neighbouring words, whereas LSTM learns from a sequential pattern of words. Individual features are concatenated to form a single feature vector that summarizes the key characteristics of the input sentence.
A multi-headed classifier was tested with a final sigmoid layer giving a confidence distribution of the classes. The sigmoid layer will represent each class prediction confidence score as a decimal between 0-1 (i.e. 0% - 100%) where each score is independent to each other. This is ideal for the multi-label problem of comments that talk about multiple topics.
The data scientists also tested a single-headed classifier where a model only learns to identify if a single class is present in the text using a softmax activation function. The number of single-headed classifier is equal to the number of classes. An input comment can have multiple labels if multiple classifiers predict that its topic is mentioned in the comment. For example, if a comment talks about language and education, the language classifier and education classifier will predict 1 to signal the presence of the relevant SMA classes and other classifiers will predict 0 to signal the absent.
A single-headed classifier learns each class better than a multi-headed classifier which needs to learn fifteen different classes, but there is the added burden for programmers to maintain fifteen different classifiers. The burden to run the multiple classifiers is minimal since it can easily be programmed to run all classifiers in a loop and output the presence of relevant class. As shown below, the single-head Bi-LSTM model performs the best across the different classes and also in the weighted average.
Table 1: Test weighted average F1-score of different models.
Table 1: Test weighted average F1-score of different models.
F1-score
Single-head Bi-LSTM
90.2%
Multi-headed CNN
76%
Bi-LSTM
73%
Amongst the multi-headed classifiers, CNN had a 4.6% higher average test F1-score than Bi-LSTM when classifying comments into SMA classes such as language and education. On the other hand, the Bi-LSTM model's average test F1-score on general census themed classes (i.e. "unrelated to the census," "positive census experience," "burden of response," "experience with the electronic questionnaire") was 9.0% higher than CNN model. Bi-LSTM was better at predicting if a comment was relevant to the Census program or not because it knew the overall context of where the comment was directed. For example, a respondent's positive opinion on a Canadian sports team is not relevant to the census, so this type of comment would be classified under the class "unrelated to the census." In this case, the CNN model predicted the comment to be positive in nature and thus to the positive census experience class, whereas Bi-LSTM tied the positive sentiment to the context (sports teams) and since the context was unrelated to the census, it correctly labelled it to be of no value for further analysis by CSMS. CNN, on the other hand, only looks at a smaller range of words so it excels in extracting features in certain parts of the sentence that are relevant to certain classes.
Next steps
This proof of concept showed that a ML model can accurately classify bilingual census comments. The classifier is multi-class, meaning that there are multiple classes to classify a comment into. It is also multi-label, meaning that more than one class may be relevant to the input comment. The second phase of this project will be to transition this model into production. In production, French and English comments will be spell checked and stemmed to the root words depending on each comment's language. A bilingual semi-supervised text classifier will predict both the cleaned French and English comments. The labelled 2019 data will train the ML model to predict and label incoming comments from the new 2021 Census of Population and ensure that the respondent comments are categorized to and shared with the appropriate expert analysts. In the production phase, when 2021 Census comments come in, the CSMS team and data scientists will continue to validate the ML predictions and feed them back to the machine to further improve the model.
If you are interested in text analytics or want to find out more about this particular project, the Applied machine learning for text analysis community of practice (GC employees only) recently featured a presentation on this project. Join the community to ask questions or discuss other text analytics projects.