Business or organization information

1. Which of the following categories best describes this business or organization?

  • Government agency
  • Private sector business
  • Non-profit organization
    • Who does this organization primarily serve?
      • Households or individuals
        e.g., child and youth services, community food services, food bank, women's shelter, community housing services, emergency relief services, religious organization, grant and giving services, social advocacy group, arts and recreation group
      • Businesses
        e.g., business association, chamber of commerce, condominium association, environment support or protection services, group benefit carriers (pensions, health, medical)
  • Don't know

2. In what year was this business or organization first established?

Year business or organization was first established:

OR

Don't know

3. In the last 12 months, did this business or organization conduct any of the following international activities?

Select all that apply.

  • Export goods outside of Canada
    Include both intermediate and final goods.
  • Export services outside of Canada
    Include services delivered virtually and in person.
    e.g., cloud services, legal services, environmental services, architectural services, digital advertising
  • Make investments outside of Canada
  • Sell goods to businesses or organizations in Canada who then resold them outside of Canada
  • Import goods from outside of Canada
    Include both intermediate and final goods.
  • Import services from outside of Canada
    Include services received virtually and in person.
    e.g., cloud services, legal services, environmental services, architectural services, digital advertising
  • Relocate any business or organizational activities or employees from another country into Canada
    Exclude temporary foreign workers.
  • Engage in other international business or organizational activities
    OR
  • None of the above

4. Over the next three months, how are each of the following expected to change for this business or organization?

Exclude seasonal factors or conditions.

  • Number of employees
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Vacant positions
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Sales of goods and services offered by this business or organization
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Selling price of goods and services offered by this business or organization
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Demand for goods and services offered by this business or organization
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Imports
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Exports
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Operating income
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Operating expenses
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Profitability
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Capital expenditures
    e.g., machinery, equipment
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable
  • Training expenditures
    • Increase
    • Stay about the same
    • Decrease
    • Not applicable

Business or organization obstacles

5. Over the next three months, which of the following are expected to be obstacles for this business or organization?

Select all that apply.

  • Shortage of labour force
  • Recruiting skilled employees
  • Retaining skilled employees
  • Shortage of space or equipment
  • Rising cost of inputs
    An input is an economic resource used in a firm's production process.
    e.g., labour, capital, energy and raw materials
  • Cost of personal protective equipment (PPE), additional cleaning or implementing distancing requirements
  • Difficulty acquiring inputs, products or supplies from within Canada
  • Difficulty acquiring inputs, products or supplies from abroad
  • Maintaining inventory levels
  • Insufficient demand for goods or services offered
  • Fluctuations in consumer demand
  • Attracting new or returning customers
  • Cost of insurance
  • Transportation costs
  • Obtaining financing
  • Government regulations
  • Travel restrictions and travel bans
  • Increasing competition
  • Challenges related to exporting goods and services
  • Maintaining sufficient cash flow or managing debt
  • Speed of internet connection
  • Intellectual property protection
  • Other
    • Specify other:
    OR
  • None of the above

Flow condition: If the business or organization is a private sector business, go to Q6. Otherwise, go to Q8.

Expectations for the next year

6. In the next 12 months, are there any plans to expand or restructure this business, or acquire or invest in other businesses?

Restructuring involves changing the financial, operational, legal or other structures of a business to make it more efficient or more profitable.

  • Yes
    • Does this business plan to:
      Select all that apply.
      • Expand current location of this business
      • Expand this business to other locations
      • Restructure this business
      • Acquire other businesses or franchises
      • Invest in other businesses
  • No
  • Don't know

7. In the next 12 months, are there any plans to transfer, sell or close this business?

  • Yes
    • Does this business plan to:
      • Transfer to family members without money changing hands
      • Sell to family members
      • Sell to employees
      • Sell to external parties
      • Close the business
      • Don't know
  • No
  • Don't know

Flow condition: If "Export goods outside of Canada" or "Export services outside of Canada" is selected in Q3, go to Q8. Otherwise, go to Q14.

Digital ordering

Extranet:
A closed network that uses internet protocols to securely share enterprise's information with suppliers, vendors, customers or other businesses partners. It can take the form of a secure extension of an Intranet that allows external users to access some parts of the enterprise's Intranet. It can also be a private part of the enterprise's website, where business partners can navigate after being authenticated in a login page.

Electronic Data Interchange (EDI):
The electronic transmission of data suitable for automated processing between businesses or organisations. Generally, EDI allows for the sending or receiving of messages (e.g., payment transactions, tax declarations, orders) in an agreed or standard format suitable for automated processing, and does not require an individual to type a message manually.

8. It was reported in a previous question that in the last 12 months, this business or organization exported goods or services outside of Canada. What percentage of these export sales were digitally ordered?

Include all sales of this business's or organization's goods or services where the order was received, and the commitment to purchase was made, over the Internet, including through web pages, applications, platforms, extranet or Electronic Data Interchange (EDI).

Exclude orders made by telephone, facsimile and email.

  • 100%
  • 50% to less than 100%
  • 1% to less than 50%
  • None

Flow condition: If "Export goods outside of Canada" is selected in Q3 and "100%", "50% to less than 100%" or "1% to less than 50%" is selected in Q8, go to Q9. Otherwise, go to the next flow.

Digital ordering of goods

9. In the last 12 months, which digital methods were used to order goods by customers or clients in other countries?

Select all that apply.

  • Electronic data interchange (EDI)
    The electronic transmission of data suitable for automated processing between businesses or organizations. Generally, EDI allows for the sending or receiving of messages (e.g., payment transactions, tax declarations, orders) in an agreed or standard format suitable for automated processing, and does not require an individual to type a message manually.
  • E-commerce, through this business's or organization's own website, application or platform
  • E-commerce, through a third-party website, application or platform
  • Extranet
    A closed network that uses internet protocols to securely share enterprise's information with suppliers, vendors, customers or other businesses partners. It can take the form of a secure extension of an Intranet that allows external users to access some parts of the enterprise's Intranet. It can also be a private part of the enterprise's website, where business partners can navigate after being authenticated in a login page.
    OR
  • Goods were not digitally ordered by customers or clients in other countries from this business or organization

Flow condition: If "Export services outside of Canada" is selected in Q3 and "100%", "50% to less than 100%" or "Less than 50%" is selected in Q8, go to Q9. Otherwise, go to the next flow.

Digital ordering of services

10. In the last 12 months, which digital methods were used to order services by customers or clients in other countries?

Select all that apply.

  • Electronic data interchange (EDI)
    The electronic transmission of data suitable for automated processing between businesses or organizations. Generally, EDI allows for the sending or receiving of messages (e.g., payment transactions, tax declarations, orders) in an agreed or standard format suitable for automated processing, and does not require an individual to type a message manually.
  • E-commerce, through this business's or organization's own website, application or platform
  • E-commerce, through a third-party website, application or platform
  • Extranet
    A closed network that uses internet protocols to securely share enterprise's information with suppliers, vendors, customers or other businesses partners. It can take the form of a secure extension of an Intranet that allows external users to access some parts of the enterprise's Intranet. It can also be a private part of the enterprise's website, where business partners can navigate after being authenticated in a login page.
    OR
  • Services were not digitally ordered by customers or clients in other countries from this business or organization

Flow condition: If any digital method to order services was selected in Q10, go to Q11. Otherwise, go to Q14.

11. In the last 12 months, for services digitally ordered by customers or clients in other countries, how were the services delivered to them by this business or organization?

Select all that apply.

  • Services were delivered digitally
    e.g., service provider and client remain in their respective countries with services delivered outside of Canada via electronic data interchange (EDI), video conferencing with clients, file sharing, websites, applications or platforms, or extranet
  • Services were delivered in person
    Include services delivered through a subsidiary or sub-contractor in country of client, or travel of service provider or client to have service delivered in-person.
    e.g., services related to the installation of goods, on-site environmental assessments

Flow condition: If "Services were delivered digitally" is selected in Q11, go to Q12. Otherwise, go to Q14.

12. In the last 12 months, how have sales of digitally delivered services to customers or clients in other countries changed?

e.g., Service provider and client remain in their respective countries with services delivered outside of Canada via electronic data interchange (EDI), video conferencing with clients, file sharing, websites, applications or platforms, or extranet

  • Increased
  • Remained stable
  • Decreased
  • Don't know

13. Which of the following is this business's or organization's preferred means for digitally delivering services to customers or clients in other countries?

  • Electronic data interchange (EDI)
  • Over the internet, through online websites, file sharing, video conferencing, applications or platforms
  • Extranet
  • Other
    • Specify other:

Environmental activities

14. Which of the following environmental practices does this business or organization have currently in place or plan to implement in the next 12 months?

Select all that apply.

  • Reducing waste
  • Reducing energy or water consumption
    e.g., sensor lights, LED lights, automated faucets
  • Encouraging employees to adopt environmentally friendly practices
    e.g., teleworking, using public transit, recycling
  • Using recycled or waste materials as inputs
  • Using one or more clean energy sources
    e.g., hydroelectricity, solar, wind
  • Choosing suppliers based on their environmentally responsible practices or products
  • Designing products or services to have a minimal impact on the environment
    e.g., eco-design that considers the product's lifecycle
  • Performing carbon sequestration activities
    e.g., planting trees, purchasing carbon credits
  • Measuring the business's or organization's environmental footprint
  • Obtaining or maintaining one or more eco-responsible certifications
  • Being zero waste
  • Having a written environmental policy
  • Hiring an external auditor to evaluate the business's or organization's environmental practices
  • Other environmental practices
    OR
  • None of the above

15. In the next 12 months, what is this business's or organization's main barrier for adopting more green practices?

  • COVID-19 has delayed the business's or organization's plans for green projects
  • The business or organization doesn't have the financial resources
  • The business's or organization's clients aren't willing to pay a higher price
  • Other
    • Specify other:
    OR
  • None of the above
    i.e., The business or organization has no barriers or no plans to adopt green practices

Personal protective equipment (PPE)

16. Where does this business or organization get or plan to get its personal protective equipment or supplies from?

e.g., masks, eye protection, face shields, gloves, gowns, cleaning products, disinfecting wipes, hand sanitizer, plexiglass or sneeze guards, thermometers

Select all that apply.

  • Domestic producer
  • International producer
  • Domestic wholesaler
  • International wholesaler
  • Domestic retailer
  • International retailer
  • Other
    OR
  • None of the above

17. Since March 2020, which of the following products has this business or organization manufactured?

Select all that apply.

  • Respirators
  • Surgical masks
  • Medical gowns
  • Hand sanitizer
  • Face shields
  • Nitrile gloves
    OR
  • None of the above

18. 12 months from now, which of the following products does this business or organization plan to manufacture?

Select all that apply.

  • Respirators
  • Surgical masks
  • Medical gowns
  • Hand sanitizer
  • Face shields
  • Nitrile gloves
    OR
  • None of the above

COVID-19 Rapid Testing

COVID-19 Rapid Test kits are self-testing kits that are used to assess and monitor the infection status of individuals with or without symptoms. Typically such kits provide a result within 15 minutes and can be used by employers to screen for COVID-19 among employees in settings where in-person work is required. Positive test results typically require confirmation by more accurate laboratory-based tests administered by public health authorities.

19. Over the last month, has this business or organization used COVID-19 Rapid Test kits to test on-site employees for COVID-19 infection?

e.g., periodic testing of employees with or without symptoms

  • Yes
    • Over the last month, what percentage of this business's or organization's on-site employees was tested using a COVID-19 Rapid Test kit at least once?
      • Percentage:
      • Don't know
    • Over the last month, on average, how frequently were employees at this business or organization tested using COVID-19 Rapid Test kits?
      • Less than once a week
      • Once a week
      • Twice a week
      • More than twice a week
  • No
    • In the next three months, does this business or organization plan to use COVID-19 Rapid Test kits to test on-site employees for COVID-19 infection?
      • Yes
      • No
      • Don't know
  • Not applicable to this business or organization
    e.g., all employees work remotely

Flow condition: If "No" is selected in Q19 and then "No", or "Don't know" is selected, go to Q20. Otherwise, go to Q21.

20. For which of the following reasons does this business or organization not have plans to use COVID-19 Rapid Test kits in the next three months?

Select all that apply.

  • Lack of awareness
  • Difficulty procuring
  • Cost of tests
  • Cost of administering tests
  • Not needed
  • Other
    • Specify other:

Funding or credit

21. Due to COVID-19, was funding or credit for this business or organization approved or received from any of the following sources?

Select all that apply.

  • Canada Emergency Business Account (CEBA)
    e.g., loan of up to $60,000 for eligible small businesses and non-profits
  • Temporary 10% Wage Subsidy
  • Canada Emergency Wage Subsidy (CEWS)
  • Canada Emergency Rent Subsidy (CERS)
  • Canada Emergency Commercial Rent Assistance (CECRA)
  • Export Development Canada (EDC) Small and Medium-sized Enterprise Loan and Guarantee program
  • Business Development Bank of Canada (BDC) Co-Lending Program for Small and Medium-sized Enterprises
  • Innovation Assistance Program
  • Regional Relief and Recovery Fund (RRRF)
  • Provincial, Territorial or Municipal government programs
  • Funding from philanthropic or mutual-aid sources
  • Financial institution
    e.g., term loan or line of credit
  • Loan from family or friends
  • Other
    • Specify other approved source of funding or credit:
    OR
  • None of the above

Flow condition: If "None of the above" is selected in Q21, go to Q22. Otherwise, go to Q23.

22. For which of the following reasons has this business or organization not accessed any funding or credit due to COVID-19?

Select all that apply.

  • Funding or credit not needed
  • Waiting for approval or in process of applying
  • Eligibility requirements
  • Application requirements or complexity
  • Lack of awareness
  • Terms and conditions
    e.g., interest rate, payment period
  • Public perception
  • Other
    • Specify other:

Liquidity and debt

23. Does this business or organization have the cash or liquid assets required to operate for the next three months?

  • Yes
  • No
    • Will this business or organization be able to acquire the cash or liquid assets required?
      • Yes
      • No
      • Don't know
  • Don't know

24. Does this business or organization have the ability to take on debt?

  • Yes
  • No
    • For which of the following reasons is this business or organization unable to take on debt?
      Select all that apply.
      • Cash flow
      • Lack of confidence or uncertainty in future sales
      • Request would be turned down
      • Too difficult or time consuming to apply
      • Terms and conditions are unfavourable
        e.g., interest rate, payment period
      • Credit rating
      • Other
        • Specify other:
  • Don't know

Outsourcing

25. In the last 12 months, has this business or organization outsourced any tasks, projects or short contracts to freelancers, "gig" workers or other businesses or organizations?

Examples of tasks, projects or short contracts might include delivery driving, cleaning, translation, and web or graphic design.

  • Yes
  • No
  • Don't know

Flow condition: If "Yes" is selected in Q25, go to Q26. Otherwise, go to Q27.

26. In the last 12 months, has this business or organization used third-party digital platforms, applications or websites to outsource tasks, projects, or short contracts?

e.g., UberEats, Fiverr, TaskRabbit, Upwork, Amazon Turk
Exclude online job boards.

  • Yes
    • In the last 12 months, what kind of business or organizational activities did this business or organization subcontract through third-party digital platforms, applications or websites?
      Select all that apply.
      • Data entry, tagging photos or videos, and other clerical tasks
      • Copywriting, editing, translation, transcription
      • Graphic design, audio-visual production
      • Website or software development, computer programming
      • Sales and marketing support
      • Delivery driving, errands
      • General labour, repairs, cleaning services
      • Accounting, law or other professional services
      • Other
        • Specify other:
  • No

27. In the last 12 months, has this business or organization bid on tasks, projects, or short contracts using third-party digital platforms, applications or websites?

e.g., Fiverr, TaskRabbit, Upwork, Amazon Turk

  • Yes
  • No

Teleworking

28. Once the COVID-19 pandemic is over, what percentage of the workforce is anticipated to continue to primarily telework?

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

Flow condition: If 1% or more of this business's or organization's workforce is anticipated to continue to primarily telework in Q28, go to Q29. Otherwise, go to Q30.

29. Does this business or organization foresee shrinking office locations because more of the workforce is teleworking?

  • Yes
  • No
  • Don't know

Future outlook

30. Over the next 12 months, what is the future outlook for this business or organization?

  • Very optimistic
  • Somewhat optimistic
  • Somewhat pessimistic
  • Very pessimistic
  • Don't know

31. How long can this business or organization continue to operate at its current level of revenue and expenditures before having to consider the following options?

Select "12 months or more" if this business or organization can operate indefinitely.

  • Laying off staff:
    • Less than 1 month
    • 1 month to less than 3 months
    • 3 months to less than 6 months
    • 6 months to less than 12 months
    • 12 months or more
    • Don't know
  • Closure or bankruptcy:
    • Less than 1 month
    • 1 month to less than 3 months
    • 3 months to less than 6 months
    • 6 months to less than 12 months
    • 12 months or more
    • Don't know

Flow condition: If the business or organization is a private sector business, go to Q32. Otherwise, go to "Contact person".

Ownership

The groups identified within the following questions are included in order to gain a better understanding of businesses owned by members of various communities across Canada.

32. What percentage of this business or organization is owned by women?

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

33. What percentage of this business or organization is owned by First Nations, Métis or Inuit peoples?

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

34. What percentage of this business or organization is owned by immigrants to Canada?

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

35. What percentage of this business or organization is owned by persons with a disability?

Include visible and non-visible disabilities.

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

36. What percentage of this business or organization is owned by LGBTQ2 individuals?

The term LGBTQ2 refers to persons who identify as lesbian, gay, bisexual, transgender, queer and/or two-spirited.

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

37. What percentage of this business or organization is owned by members of visible minorities?

A member of a visible minority in Canada may be defined as someone (other than an Indigenous person) who is non-white in colour or race, regardless of place of birth.

Provide your best estimate rounded to the nearest percentage.

Percentage:

OR

Don't know

Flow condition: If more than 50% of this business or organization is owned by members of visible minorities, go to Q38. Otherwise, go to "Contact person".

38. It was indicated that over 50% of this business or organization is owned by members of visible minorities. Please select the categories that describe the owner or owners.

Select all that apply.

  • South Asian
    e.g., East Indian, Pakistani, Sri Lankan
  • Chinese
  • Black
  • Filipino
  • Latin American
  • Arab
  • Southeast Asian
    e.g., Vietnamese, Cambodian, Laotian, Thai
  • West Asian
    e.g., Afghan, Iranian
  • Korean
  • Japanese
  • Other group
    • Specify other group:
    OR
  • Prefer not to say

Canadian Health Measures Survey - Cycle 6 (2018-2019) Non response bias – Fasted subsample

Canadian Health Measures Survey - Cycle 6 (2018-2019) - Fasted subsample
Table summary
This table displays the results of Canadian Health Measures Survey - Cycle 6 (2018-2019) - Fasted subsample. The information is grouped by Age group and sex (appearing as row headers), Combined response rate (%) (appearing as column headers).
Age group Sex Combined response rate (%)
Ages 6 to 11 Males 33.7
Females 27.2
Ages 12 to 19 Males 35.3
Females 35.2
Ages 20 to 39 Males 35.0
Females 37.2
Ages 40 to 59 Males 39.3
Females 36.6
Ages 60 to 79 Males 41.6
Females 38.9

Canadian Health Measures Survey - Cycle 6 (2018-2019) Data accuracy – Fasted subsample

Canadian Health Measures Survey - Cycle 6 (2018-2019) - fasted subsample
Table summary
This table displays the results of Canadian Health Measures Survey - Cycle 6 (2018-2019) - fasted subsample. The information is grouped by age and sex (appearing as row headers), Average Glucose (mmol/L) calculated using Average (mmol/L) and c.v. percentage (appearing as column headers).
Age Group Sex Average Glucose (mmol/L)
Average (mmol/L) c.v. (%)
Ages 6 to 11 Males 4.99 1.17
Females 4.9 1.22
Ages 12 to 19 Males 5.1 0.56
Females 4.96 1.20
Ages 20 to 39 Males 5.22 0.88
Females 4.94 0.79
Ages 40 to 59 Males 5.78 1.88
Females 5.48 2.27
Ages 60 to 79 Males 6.28 4.18
Females 5.77 2.70

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography - April 2021

Monthly Survey of Food Services and Drinking Places: CVs for Total sales by Geography - April 2021
Table summary
This table displays the results of CVs for Total sales by Geography. The information is grouped by Geography (appearing as row headers), Month and percentage (appearing as column headers).
Geography Month
202004 202005 202006 202007 202008 202009 202010 202011 202012 202101 202102 202103 202104
percentage
Canada 1.21 0.75 0.34 0.35 0.19 0.21 0.21 0.20 0.25 0.20 0.19 0.44 1.00
Newfoundland and Labrador 2.03 1.30 1.05 0.82 0.36 0.62 1.53 0.30 0.48 1.08 0.48 2.23 2.41
Prince Edward Island 52.43 11.92 9.11 8.73 0.95 0.63 0.84 1.08 1.81 1.63 1.04 1.07 18.65
Nova Scotia 4.09 3.94 0.88 1.50 1.39 0.37 0.77 0.36 1.03 0.91 0.40 0.88 3.06
New Brunswick 2.39 2.08 0.82 0.60 2.28 0.50 0.33 0.39 0.49 0.98 0.50 0.44 1.26
Quebec 1.93 1.66 0.70 0.77 0.48 0.56 0.65 0.55 0.79 0.68 0.67 0.42 1.04
Ontario 2.24 1.33 0.63 0.70 0.26 0.31 0.25 0.28 0.45 0.34 0.24 0.99 2.16
Manitoba 5.60 2.47 0.81 0.70 0.34 0.34 0.72 0.93 0.78 0.89 0.46 0.48 1.53
Saskatchewan 5.72 3.08 0.58 1.55 0.67 0.99 0.91 1.04 0.75 0.91 0.52 0.52 1.60
Alberta 2.62 1.76 0.63 0.53 0.23 0.55 0.33 0.36 0.54 0.52 0.33 0.79 1.87
British Columbia 3.21 2.19 1.03 0.83 0.67 0.58 0.72 0.68 0.39 0.33 0.56 0.97 2.75
Yukon Territory 10.07 3.77 3.06 1.41 1.57 1.64 1.72 1.71 4.34 5.07 1.96 3.13 73.01
Northwest Territories 6.95 3.24 2.48 1.43 1.94 2.14 2.10 2.04 1.97 6.05 1.83 3.05 79.73
Nunavut 315.64 5.07 3.93 1.82 0.56 2.60 2.45 67.48 2.75 2.54 2.39 2.53 4.52

Retail Trade Survey (Monthly): CVs for Total sales by geography - April 2021

CVs for Total sales by geography - April 2021
This table displays the results of Annual Retail Trade Survey: CVs for Total sales by geography - April 2021. The information is grouped by Geography (appearing as row headers), Month and Percent (appearing as column headers).
Geography Month
202104
%
Canada 0.7
Newfoundland and Labrador 1.3
Prince Edward Island 2.2
Nova Scotia 1.9
New Brunswick 2.1
Quebec 1.6
Ontario 1.3
Manitoba 1.0
Saskatchewan 2.2
Alberta 1.5
British Columbia 1.2
Yukon Territory 1.7
Northwest Territories 0.6
Nunavut 1.1

Data Access Division newsletter - Spring 2021 edition

PDF Version (PDF, 257.76 KB)

A message to our staff and clients

With the arrival of spring and warmer weather comes a sense of hope with the development of new vaccinations across the globe as we move beyond the one-year mark since the first COVID-19 lockdown. The Data Access Division (DAD) would like to take a moment to thank its beloved staff. The success of our program comes from the hard work and dedication that each member has continued to show collectively throughout these changing times. We could not have achieved our advancements without your great efforts and continued collaboration. We would also like to thank our clients and friends for their continued patience and support as we are constantly reminded of how fortunate we are to be part of such a strong community. We remain devoted to continuing our work to ensure that you are provided with the real-time data and services that you need.

Celebrating accomplishments and focus for the upcoming year

DAD would like to highlight and celebrate some of its greatest accomplishments within the last few months. The Self-Serve Access (SSA) section provided virtual access to clients and successfully onboarded new users to the Public Use Microdata File (PUMF) Collection, provided free access and created accounts for research data centre (RDC) researchers for Real Time Remote Access (RTRA), and had 82 Data Liberation Initiative (DLI) member institutions. The Virtual Data Lab (VDL) team successfully onboarded its first set of users in the new cloud environment for its first pilot project back in February, bringing the team one step closer to a production environment. DAD, in partnership with the Canadian Research Data Centre Network (CRDCN), recently opened two new RDCs! One centre is located at Carleton University in Ottawa, and a new satellite centre opened its doors to researchers at the University of Calgary. In addition, the first Business Research Microdata (BRM) file that uses real, not synthetic, data was released to the RDCs. To help support researchers working in RDCs and in the VDL, we have produced a series of short training videos to help support researchers in producing their statistical output for release.

For the upcoming year, DAD will continue to focus on collaboration efforts with various teams and partners. We will focus on leveraging new technologies to help drive Statistics Canada's modernization efforts by developing new and innovative ways to access microdata, such as developing the Virtual Research Data Centre (vRDC) and the VDL, increasing granularity while meeting researcher needs, and continuing to provide the research community increased and faster access to data to support better decision making for policies and programs across the country. In the RDC Program, we will be opening another centre to researchers, as well as increasing our business data holdings.

Self-serve access

Data Liberation Initiative Team Updates

Welcome to another DLI program membership year. The SSA section has added the following new services to the program:

  • one free RTRA account per institution starting April 1, 2021
  • limited number of free custom tabulations
  • access to training offered by Regional Services.

External Advisory Committee

The External Advisory Committee (EAC) sent a call-out in February to the Listserv for two volunteers to represent the Atlantic and Ontario regions. The SSA section and the EAC would like to welcome Jane Fry as an Ontario Region representative and to thank two members who have stepped down from the committee: Peter Webster, Co-Chair of the EAC and Atlantic Region representative, and Claire Wollen, Ontario Region representative.

Professional Development Committee

The chair of the committee, Alex Cooper, sent an email to the Listserv confirming to the DLI community that face-to-face training has been cancelled again this year, but we will be doing national training again with regional sessions.

The Professional Development Committee (PDC) sent a call-out to the Listserv in March for a volunteer to represent the Quebec Region.

The SSA section and the PDC would like to take this opportunity to welcome Vivek Jadon from McMaster University as the new Ontario Regional Training Coordinator.

The PDC is working on several initiatives:

  • Contacts and Alternates Survey – a working group is in place to revise the survey
  • DLI Training Repository – the committee is looking at options
  • colleges – a sub-committee met with college representatives from each region to discuss their needs
  • training – a working group is in place to discuss training needs and coordinate with other data-centric organizations, such as the Canadian Research Data Centre Network and Portage.

Statcan web redesign project

The newly designed DLI website is now live! The website includes the updated DLI Contact and Alternate's Survival Guide, information on the program, and resources. You can also easily navigate through the different data access programs.

Public Use Microdata Files online project

We are working on putting PUMFs online in a downloadable format. Newly released PUMFs are being added to the website as they become available, and older PUMFs are being added in phases. As part of this project, digital object identifiers are being assigned to PUMFs.

Data releases to DLI since January 2021

  • National Travel Survey (NTS) 2019 PUMF
  • General Social Survey (GSS) Cycle 33, 2018 PUMF
  • Canadian Perspectives Survey Series (CPSS) 5 PUMF
  • Labour Force Survey (LFS) January 2021 PUMF
  • Provincial Symmetric Input-Output Tables (2016 and 2017)
  • December 2020 Business Counts
  • Input-Output Multipliers Link 1961
  • Hate Crimes (Province) Table E and Table F
  • Human Trafficking Data Table
  • Postal Code Conversion File Plus (PCCF+) Version 7D, November 2020
  • Labour Force Survey (LFS) February 2021 PUMF
  • Canadian Housing Survey (CHS) 2018 and 2019 PUMF

A list of all DLI products is available on the website: Data Liberation Initiative.

Real Time Remote Access updates

RDC researchers have had their access extended to March 31, 2022.

SAS Assistant

The graphical user interface has been launched! The number of surveys available is currently limited. However, more surveys will be added throughout the year.

The SAS Assistant will help users with little SAS experience to generate successful tables. You will be able to use buttons and dropdown menus to build your SAS code, and your code is created as you select the variables.

Data releases To RTRA since January 2021

  • Labour Force Survey (LFS) – monthly
  • Registered Apprenticeship Information System (RAIS) 2019 (January 2021)
  • Crowdsourcing 1: Impacts of COVID-19 on Canadians – All weeks
  • Crowdsourcing 2: Impacts of COVID-19 on Canadians – Your Mental Health
  • Crowdsourcing 3: Impacts of COVID-19 on Canadians – Perceptions of Safety
  • Crowdsourcing 4: Impacts of COVID-19 on Canadians – Trust in Others
  • Crowdsourcing 5: Impacts of COVID-19 on Canadians – Parenting During the Pandemic
  • Crowdsourcing 6: Impacts of COVID-19 on Canadians – Living with Long-term Conditions and Disabilities
  • Crowdsourcing 7: Impacts of COVID-19 on Canadians – Experiences of Discrimination
  • Survey of Household Spending (SHS) 2005
  • Survey of Household Spending (SHS) 2007
  • Survey of Household Spending (SHS) 2009

A list of all RTRA products is available on the website: Real Time Remote Access.

Research Data Centres

Research Data Centre updates

While RDCs are still operating under reduced capacity because of COVID-19 restrictions, we are excited to announce the opening of a new centre at Carleton University and new satellite centre at the University of Calgary. Researchers now have access to two sites in both Ottawa and Calgary to facilitate demand for access across the cities.

The high-level Joint Task Force, co-chaired by Martin Taylor (Executive Director, CRDCN) and Jacques Fauteux (Assistant Chief Statistician, StatCan), focused on developing and aligning data access strategies for academic researchers and provided a report to the CRDCN executive board and the Chief Statistician in February. The report gave an overview of the technical infrastructure of the vRDC, identified possible intersections with the VDL platform, and highlighted critical business questions that will be explored in partnership between the CRDCN and StatCan in the coming months. Presentations on the report and next steps will be provided to StatCan staff and CRDCN academic directors in April.

We are pleased to announce that our pilot to test the VDL access platform with academic researchers was launched in March! Starting with selected projects in four universities, a StatCan cloud infrastructure platform will make StatCan microdata securely accessible outside RDC facilities. This pilot project involves the University of Toronto, Université de Montréal, McMaster University, and University of Calgary to start, but will be implemented on a broader scale after pilot testing is complete and the vRDC infrastructure is available. For more information on the pilot testing, please see the Modernization of Access section below.

New Research Data centre holdings

On February 23, the CRDCN hosted a very well-attended webinar on the BRM. This will be the first business dataset released to the RDCs using real, not synthetic, data. It is expected that it will take extra time for researchers to become familiar with the data. As well, vetting will take longer than for a typical social data project because we will be using the BRM to create and test business data vetting rules. For these reasons, it is not recommended that students undertake their thesis or dissertation work with the BRM. Researchers can start applying for access in April.

A total of 25 products were added to our data holdings in the fourth quarter of the 2020/2021 fiscal year. These include two new surveys (Survey of Digital Technology and Internet Use and Impacts of COVID-19 on Health Care Workers: Infection Prevention and Control), one new linked data file (Survey of Approaches to Educational Planning linked to Postsecondary Student Information System–Registered Apprenticeship Information System–T1 Family File), as well as updated survey cycles and administrative files.

New Data Access Training Video Series

Partial list of data files updated from January to March 2021

  • Canadian Health Survey on Children and Youth (CHSCY) 2019
  • Vital Statistics - Death Database (VSDD) 2019
  • Survey of Household Spending (SHS) 2017
  • General Social Survey (GSS) – Caregiving and Care Receiving, Cycle 33
  • General Social Survey (GSS) – Victimization, Cycle 34
  • Longitudinal Immigration Database (IMDB) 2019
  • Canada Education Savings Programs (CESP) linked to 2016 Census
  • Registered Apprenticeship Information System (RAIS)

For a complete list of data available in RDCs and government access centres, visit Data available at the Research Data Centres.

We will soon be releasing our Data Access Training Video Series! These short videos will provide practical instruction to StatCan microdata users on a variety of topics related to access and data analysis. Our initial series of videos will cover how to prepare your output for confidentiality vetting, such as applying rounding techniques and testing for homogeneity and dominance using a variety of statistical software packages.

Government Data Access Federal Research Data Centres

The Government Data Access team has started plans to merge the Federal Research Data Centre (FRDC) and the Social Data Access Centre and the Business Data Access Centre (formerly known as CDER), both located at Tunney's Pasture, into one location by the end of spring 2021. This new centre will provide access to both social and business data for federal government users. The integration of business and social data access into one physical location is a significant step in completing the full integration of the Business Data Access program under the FRDC umbrella.

DAD is also working with Employment and Social Development Canada (ESDC) to facilitate access and address research needs for the emergency response data linked to StatCan datasets. ESDC will undergo an accreditation process and training, and data access will be available from home for researchers.

Provincial secure access points

Two provincial secure access points continue to operate in British Columbia and Alberta. Two more sites will open in spring 2021 in Ontario at the Ministry of Finance and the Ministry of Children, Community and Social Services. For more information about these initiatives, please contact statcan.maddlidamidd.statcan@statcan.gc.ca.

Welcome Shelley Jeglic!

We would like to welcome Shelley Jeglic, who has joined DAD as the new chief of the Government Data Access Program. Shelley previously worked in the Centre for Population Health Data and is excited to make the switch from client to service provider.

Modernization of access

Pilot projects and testing

The VDL team is excited to announce that eight researchers from the University of Toronto, University of Calgary, McMaster University and Université de Montréal, and four researchers from the Public Health Agency of Canada are now accessing the VDL environment as part of the academic pilot! They are accessing microdata files with low to medium levels of sensitivity. This is the first set of users to be granted access to anonymized microdata in the cloud environment using the VDL. The VDL team has been working hard to establish and expand governance and system capabilities to support this virtual access. This is a big milestone for the team, and a step forward in enabling secure virtual access to data for more accessible research, and, in turn, better decision making. We could not have reached this level of success on this project and our pilots without the collaborative effort of many teams and individuals.

Going forward, the VDL project will continue to onboard users identified for our pilots. The established pilots will help evaluate the nuances of onboarding different types of researchers, as well as help inform how the user experience of the environment can be improved leading up to production. Once the VDL team has successfully conducted the pilots using data with low to medium levels of sensitivity, the team will conduct additional pilots using data with medium to high levels of sensitivity, as this is more representative of the data that are typically used by researchers. StatCan will select pilots based on several criteria to learn about and improve the VDL process to meet the project's objectives.

Overall, with the VDL, StatCan will be better positioned to advance its user-centricity by introducing this new mode of access and contribute to the agency's modernization efforts.

Virtual Data Lab project updates

The VDL will vastly improve access to statistical information for researchers by providing users 24/7 remote access to data housed at StatCan using a secure IT connection and a protected cloud environment. Progress is ongoing on a number of key initiatives to increase virtual data access and promote collaboration. These include the development of analytics platforms and monitoring capabilities, and continued assessment and development of the Client Relationship Management System (CRMS) and the Microdata Search Tool.

A number of monitoring mechanisms have been approved and are available to StatCan for deemed employees to access protected microdata in the cloud environment. Staff from DAD will use a variety of mechanisms to monitor for potential security incidents and follow the established incident protocol when required, which supplements the Information and Privacy Breach Protocol at StatCan.

Development on the CRMS corporate project continues under the Dissemination Division. The Dissemination team is currently working with IT to establish development priorities and DAD for pilot assessments. Stay tuned for more information!

Questions or comments? Visit Access to microdata.

Check out the StatCan Blog.

Don't forget to follow us on social media!

Date modified:

Wholesale Trade Survey (monthly): CVs for total sales by geography - April 2021

Wholesale Trade Survey (monthly): CVs for total sales by geography - April 2021
Geography Month
202004 202005 202006 202007 202008 202009 202010 202011 202012 202101 202102 202103 202104
percentage
Canada 0.8 0.8 0.7 0.7 0.7 0.7 0.5 0.6 0.8 0.8 0.7 0.6 0.7
Newfoundland and Labrador 0.6 0.4 0.1 0.2 0.4 0.4 0.4 0.4 0.4 0.6 0.5 0.2 1.4
Prince Edward Island 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nova Scotia 3.5 5.5 2.5 2.1 1.9 1.7 2.7 3.4 6.3 1.8 1.7 2.6 4.5
New Brunswick 2.2 3.2 2.7 2.0 3.6 3.5 2.9 5.0 3.5 3.4 2.6 1.1 1.3
Quebec 2.4 1.9 2.0 1.7 2.3 1.9 1.5 1.4 1.7 1.8 1.8 1.9 1.8
Ontario 1.2 1.2 1.1 1.0 0.9 1.0 0.8 0.9 1.3 1.2 1.1 0.9 1.1
Manitoba 2.5 2.6 1.1 1.2 1.8 2.8 1.7 1.4 2.5 1.7 2.4 1.8 3.0
Saskatchewan 1.2 0.6 0.7 1.2 1.4 0.7 0.9 0.9 1.0 1.0 1.6 1.2 0.8
Alberta 2.9 2.9 2.5 2.3 1.9 3.4 1.3 1.3 1.7 1.0 1.2 1.1 1.2
British Columbia 1.5 1.8 1.6 1.3 1.9 1.8 1.4 1.5 1.4 1.5 1.4 1.5 1.2
Yukon Territory 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Northwest Territories 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Nunavut 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2021 Census: Enumerators now following up with dwellings

They are carrying out an important task on behalf of all Canadians

A Statistics Canada enumerator stands on the sidewalk holding a clipboard and pen, and wearing an a PPE mask as well as a Census 2021 red vest and an employee identification badge.
A census enumerator. Census enumerators safely collect the data that is vital to improving the lives of Canadians.

June 10, 2021 – Ottawa, ON – Statistics Canada

Statistics Canada thanks all Canadians who have completed their 2021 Census to date. Millions of households have responded to the questionnaires safely online, on paper or over the phone. Where needed, some census enumerators, who adhered to strict health and safety protocols, dropped off invitation letters to households that did not receive the invitation in the mail.

Statistics Canada enumerators are now following up with dwellings from which completed questionnaires have not yet been received. Every attempt is made by Statistics Canada employees to reach households by phone before enumerators conduct in-person visits to remind residents to complete the census and offer assistance.

Thousands of census enumerators have been hired across the country to collect the data that is vital to improving the lives of Canadians.

Enumerators visiting dwellings are following a new no-contact protocol. Under this protocol, no interviews are conducted inside the respondent's dwelling and no census employee from Statistics Canada is permitted to visit or enter institutional collective dwellings, especially the dwellings housing residents who are most vulnerable to COVID-19, such as seniors' residences. In accordance with guidelines from public health authorities, interviews take place outdoors and physically distanced and census employees are required to wear masks, and hand sanitizer is provided to employees so they may frequently disinfect their hands.

It's not too late for households to make their census contact-free by completing it online, on paper or over the phone. Households can still contact the Census Help Line at 1-855-340-2021 to request a secure access code or at 1-877-885-2021 to receive a paper questionnaire. Answers to many questions are also available on the census website.

Information from the census ensures that communities have the information they need to plan services that support employment, schools, public transit and hospitals. Millions of Canadians have counted themselves in already—have you?

Contacts

For more information, contact Media Relations at 613-951-4636, or at statcan.mediahotline-ligneinfomedias.statcan@statcan.gc.ca.

National Weighted Rates by Source and Characteristic, April 2021

National Weighted Rates by Source and Characteristic, April 2021
  Data source
Response or edited Imputed
%
Sales of goods manufactured 76.0 24.0
Raw materials and components 65.4 34.6
Goods / work in process 74.4 25.6
Finished goods manufactured 65.1 34.9
Unfilled Orders 83.9 16.1
Capacity utilization rates 61.2 38.8

From Exploring to Building Accurate Interpretable Machine Learning Models for Decision-Making: Think Simple, not Complex

By: Yadvinder Bhuller, Health Canada; Keith O’Rourke, Health Canada

In spite of an increasing number of examples, where both simple and complex prediction models have been used for decision-making, accurate prediction continues to be pertinent for both models. The added element is that the more complex a model, the more potential it has for less uptake from novice users who may not be familiar with machine learning (ML). Complex prediction models can arise from attempts to maximize predictive accuracy without regard to how difficult it would be for an individual to anticipate the predictions from the input data. However, even with a method considered as simple as linear regression, the complexity increases as more variables and their interactions are added. At the other extreme of using numerous non-linear functions for prediction, as with Neural Nets, it is possible that the results can be too complex to understand. Such models are usually called black box prediction models. Accurate interpretable models can also vary from accurate decision trees and rule lists that are so concise they can be fully described in a sentence or two for tabular data, through modern generalized additive models (e.g., for more challenging medical records), to methods to disentangled neural nets for unstructured data such as pixels. A recent notable addition is the use of Bayesian soft complexity constrained unsupervised learning of deep layers of latent structure that is then used to construct a concise rule list with high accuracy (Gu and Dunson, 2021).

An early example, over 20 years ago, of a simple method providing as accurate prediction as more complex models is the 1998 study, by Ennis et al., of various ML learning methods to the GUSTO-I database where none of the methods could outperform a relatively simple logistic regression model. More recent accounts of complex methods, even when simple ones could suffice, are noted in the 2019 article by Rudin and Radin. The often suggested simple remedy for this unmanageable complexity is just finding ways to explain these black box models; however, those explanations can sometimes miss key information. In turn, rather than being directly connected with what is going on in the black box model, they result in being "stories" for getting concordant predictions. Given that concordance is not perfect, they can result in very misleading outcomes for many situations.

Perhaps what is needed is a wider awareness of the increasing number of techniques to build simple interpretable models from scratch that achieve high accuracy. The techniques are not simple refinements of linear or logistic regression (by rounding their coefficient to integers which loses accuracy), but involve discernment of appropriate domain-based constraints and newer methods of constrained optimization. This results in a spectrum of ease of interpretability of prediction across different applications.

Understanding where and when to be simple!

While we need to accept what we cannot understand, we should never overlook the advantages of what we can understand. For example, we may never fully understand the physical world. Nor how people think, interact, create and or decide. In ML, Geoffrey Hinton's 2018 YouTube drew attention to the fact that people are unable to explain exactly how they decide in general if something is the digit 2 or not. This fact was originally pointed out, a while ago, by Herbert Simon, and has not been seriously disputed (Erickson and Simon, 1980). However, prediction models are just abstractions and we can understand the abstractions created to represent that reality, which is complex and often beyond our direct access. So not being able to understand people is not a valid reason to dismiss desires to understand prediction models.

In essence, abstractions are diagrams or symbols that can be manipulated, in error-free ways, to discern their implications. Usually referred to as models or assumptions, they are deductive and hence can be understood in and of themselves for simply what they imply. That is, until they become too complex. For instance, triangles on the plane are understood by most, while triangles on the sphere are understood by less. Reality may always be too complex, but models that adequately represent reality for some purpose need not be. Triangles on the plane are for navigation of short distances while on the sphere, for long distances. Emphatically, it is the abstract model that is understood not necessarily the reality it attempts to represent.

However, for some reason, a persistent misconception has arisen in ML that models for accurate prediction usually need to be complex. To build upon previous examples, there remains some application areas where simple models have yet to achieve accuracy comparable to black box models. On the other hand, simple models continue to predict as accurately as any state of the art black box model and thus, the question, as noted in the 2019 article by Rudin and Radin, is: "Why Are We Using Black Box Models in AI When We Don't Need To?"

In application areas where simple models can be as accurate, not using such models has unnecessarily led to recommendations that can impact areas including societal, health, freedom, and safety. An often-discussed hypothetical choice between the accurate machine-learning-based robotic surgeon and the less-accurate human surgeon is moot once someone builds an interpretable robotic surgeon that is as accurate as any other robot. Again, it is the prediction model that is understandable, not necessarily the prediction task itself.

Simple and interpretable models?

The number of application areas where accurate simple prediction models can be built to be understood has been increasing over time. Arguably, perhaps these models should be labeled as "interpretable" ML, as they are designed from scratch to be interpretable. They are purposely constrained so that their reasoning processes are more understandable to most if not all human users. This not only makes the connection between input data and predictions almost obvious, but it is also easier to troubleshoot and modify as needed. Interpretability is in the eye of the domain and interpretability constraints can include the following:

  • Sparsity of the model
  • Monotonicity with respect to a variable
  • Decomposability into sub-models
  • An ability to perform case-based reasoning
  • Disentanglement of certain types of information within the model's reasoning process
  • Generative constraints (e.g. biological processes)
  • Preferences among the choice of variables
  • Any other type of constraint that is relevant to the domain.

Some notable examples of interpretable models include sparse logical models (such as decision trees, decision lists, and decision sets) and scoring systems which are linear classification models that require users to add, subtract, and multiply only a few small numbers to make a prediction. These models can be much easier to understand than multiple regression and logistic regression, which can be difficult to interpret. Now, the intuitive simplification of these regression models, by restricting the number of predictors and rounding the coefficients, does not provide optimal accuracy. This is just a post hoc adjustment. It is better to build in interpretability from the very start.

There is increasing understanding based on considering numerous possible prediction models in a given prediction task. The not-too-unusual observation of simple models performing well for tabular data (a collection of variables, each of which has meaning on its own) was noted over 20 years ago and was labeled the "Rashomon effect" (Breiman, 2001). Breiman posited the possibility of a large Rashomon set in many applications; that is, a multitude of models with approximately the same minimum error rate. A simple check for this is to fit a number of different ML models to the same data set. If many of these are as accurate as the most accurate (within the margin of error), then many other untried models might also be. A recent study (Semenova et al., 2019), now supports running a set of different (mostly black box) ML models to determine their relative accuracy on a given data set to predict the existence of a simple accurate interpretable model—that is, a way to quickly identify applications where it is a good bet that accurate interpretable prediction model can be developed.

What's the impact on ML from full data science life-cycle?

The trade-off between accuracy and interpretability with the first fixed data set in an application area may not hold over time. In fact, it is expected to change as either more data accumulate, the application area becomes better understood, data collection is refined or new variables are added or defined and the application area changes. In a full data science process itself, even in the first data set, one should critically assess and interpret the results and tune the processing of the data, the loss function, the evaluation metric, or anything else that is relevant. More effectively turning data into increasing knowledge about the prediction task which can then be leveraged to increase both accuracy and likely generalization. Any possible trade-off between accuracy and interpretability therefore should be evaluated in the full data science process and life cycle of ML.

The full data science and life-cycle process likely is different when using interpretable models. More input is needed from domain experts to produce an interpretable model that make sense to them. This should be seen as an advantage. For instance, it is not too unusual at a given stage to find numerous equally interpretable and accurate models. To the data scientist, there may seem little to guide the choice between these. But, when shown to domain experts, they may easily discern opportunities to improve constraints as well as indications of which ones are less likely to generalize well. All equally interpretable and accurate models are not equal in the eyes of domain experts.

Interpretable models are far more trustworthy in that they can be more readily discerned where and when they should be trusted or not and in what ways. But, how can one do this without understanding how the model works, especially for a model that is patently not trustworthy? This is especially important in cases where the underlying distribution of data changes, where it is critical to trouble shoot and modify without delays, as noted in the 2020 article by Hamamoto et al. It is arguably much more difficult to remain successful in the ML full life cycle with black box models than with interpretable models. Even for applications where interpretable models are not currently accurate enough, interpretable models can be used a tool to help debug black box models.

Misunderstanding explanations

There is now a vast and confusing literature, which conflates interpretability and explainability. In this brief blog, the degree of interpretability is taken simply as how easily the user can grasp the connection between input data and what the ML model would predict. Erasmus et al. (2020) provide a more general and philosophical view. Rudin et al. (2021) avoid trying to provide an exhaustive definition by instead providing general guiding principles to help readers avoid common, but problematic ways of thinking about interpretability. On the other hand, the term "explainability" often refers to post hoc attempts to explain a black box by using simpler 'understudy' models that predict the black box predictions. However, as noted in the Government of Canada's (GoC's) Guideline on Service and Digital, prediction is not explanation, and when they are proffered as explanations they can seriously mislead (GoC, 2021). Often this literature assumes that one would just explain a black box without consideration of whether there is an interpretable model of the same accuracy, perhaps having uncritically bought into the misconception that only models that are too complex to understand can achieve acceptable accuracy.

The increasing awareness of the dangers of these "explanations" has led one group of researchers to investigate how misunderstanding can actually be purposefully designed in; something regulators may increasingly need to worry about (Lakkaraju and Bastani, 2019). It is also not uncommon for those who routinely do black box modeling to offer explanations of these models as an alternative or even a reason to forego learning about and developing interpretable models.

Keeping it simple

Interpretable ML models are simple and can be relied upon when relying upon ML tools for decision-making. On the other hand, even interpretability is probably not needed for decisions where humans can verify or modify the decision afterwards (e.g. suggesting options). Notwithstanding the desire for simple and accurate models, it is important to note that currently interpretable MLs cannot match the accuracy of black box models in all application areas. For applications involving raw data (pixels, sound waves, etc.) black box neural networks have a current advantage over other approaches. In addition, black box models allow users to delegate responsibility for grasping implications of adopting the model. Although a necessary trade-off between accuracy and interpretability does remain in some application areas, its ubiquity remains an exaggeration and the prevalence of the trade-off may continually decrease in the future. This has created a situation in ML where opportunities to understand and reap the benefits are often overlooked. Therefore, the advantages of newer interpretable modelling techniques should be fully considered in any ML application, at a minimum to determine if adequate accuracy is achievable. Perhaps and in the end it may boil down to the fact that if simple works, then why make things more complex.

Team members: Keith O'Rourke (Pest Management Regulatory Agency), Yadvinder Bhuller (Pest Management Regulatory Agency).

Keep on machine learning...

Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16(3): 199-231. DOI: 10.1214/ss/1009213726

Ennis, M., Hinton, G., Naylor, D., Revow, M., and Tibshirani, R. (1998). A Comparison of Statistical Learning Methods on the Gusto Database. Statistics. Med. 17, 2501-2508. A comparison of statistical learning methods on the GUSTO database

Erasmus, A., Bruent, T.D.P., and Fisher E. (2020). What is Interpretability? Philosophy & Technology. What is Interpretability?

Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215–251. Verbal reports as data.

Government of Canada. (2021). Guideline on Service and Digital. Guideline on Service and Digital. [Accessed: May 13, 2021].

Gu, Y., and Dunson, D.B. (2021). Identifying Interpretable Discrete Latent Structures from Discrete Data. arXiv:2101.10373 [stat.ME]

Hinton, G. (2018). Why Is a Two a Two? Why Is A Two A Two? With Geoffrey Hinton and David Naylor [Accessed: May 13, 2021].

Hamamoto, R., Suvarna, K., Yamada, M., Kobayashi, K., Shinkai, N., Miyake, M., Takahashi, M., Jinnai, S., Shimoyama, R., Sakai, A., Taksawa, K., Bolatkan, A., Shozu, K., Dozen, A., Machino, H., Takahashi, S., Asada, K., Komasu, M., Sese, J., and Kaneko., S. (2020). Application of Artificial Intelligence Technology in Oncology: Towards the Establishment of Precision Medicine. Cancers. 12(12), 3532; Application of Artificial Intelligence Technology in Oncology: Towards the Establishment of Precision Medicine

Lakkaraju, H., and Bastani, O. (2019). "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations. arXiv:1911.06473 [cs.AI]

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., and Zhong, C. (2021). Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. arXiv:2103.11251 [cs.LG]

Rudin, C., & Radin, J. (2019). Why Are We Using Black Box Models in AI When We Don't Need To? A Lesson From An Explainable AI Competition. Harvard Data Science Review, 1(2). Why Are We Using Black Box Models in AI When We Don't Need To? A Lesson From An Explainable AI Competition

Semenova, R., Rudin, C., and Parr, R. (2019). A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. arXiv:1908.01755 [cs.LG]

Date modified: