National Travel Survey: Response Rate - Q1 2023

National Travel Survey: Response Rate - Q1 2023
Table summary
This table displays the results of Response Rate. The information is grouped by Province of residence (appearing as row headers), Unweighted and Weighted (appearing as column headers), calculated using percentage unit of measure (appearing as column headers).
Province of residence Unweighted Weighted
Percentage
Newfoundland and Labrador 24.4 21.5
Prince Edward Island 23.4 21.0
Nova Scotia 28.9 26.0
New Brunswick 26.4 23.0
Quebec 30.8 27.0
Ontario 28.4 26.3
Manitoba 31.8 28.5
Saskatchewan 29.6 26.3
Alberta 27.7 25.9
British Columbia 31.2 29.4
Canada 29.0 26.8

Invitation to participate in the revision of the North American Industry Classification System (NAICS) Canada

Opened: August 2023

Introduction

Statistics Canada invites data producers and data users, representatives of business associations, government bodies at the federal, provincial and local levels, academics and researchers and all other interested parties to submit proposals for the revision to the North American Industry Classification System (NAICS) Canada.

Following the decision of the Statistics Canada's Economic Standards Steering Committee (ESSC) on April 28, 2023 to institute a permanent consultation process for NAICS Canada, proposals for changes to NAICS Canada may be submitted and reviewed on an ongoing basis. Only a cut-off date for considering proposed changes to be included into a new version of NAICS Canada will be instituted moving forward. For example, for NAICS Canada 2027, the deadline for changes to be included has been set to the end of June 2025. For revisions beyond 2027, such a cut-off date will be maintained at about one year and half prior to the release date of the new classification version based on the 5-year revision cycle.

As was done with NAICS Canada 2017 (2 updates), in exceptional circumstances, when a consensus is reached among the data producers and users at Statistics Canada, the classification might be revised before the regular revision cycle of 5-years, as the way of 'evergreening' of the standard.

In the context of statistical classifications, evergreening refers to updating the classification and the related reference (index) file on a continuous basis with the objective of maintaining timeliness and relevance. Though, evergreening does not necessary result in the release of a new version of the classification every year. A decision to release a new version (before the end of the regular 5 years revision cycle) needs to be discussed and assessed by key classification stewards considering potential impacts on data and statistical programs.

Objectives

We are seeking proposals for changes for two main reasons:

  • collect input from data producers and users as an integral part of the NAICS revision process, and
  • ensure users' needs continue to be met, therefore the classification remains relevant.

Background

The North American Industry Classification System was released for the first time in 1997, with NAICS 1997. This classification was developed through the cooperation of Statistics Canada, Mexico's Instituto Nacional de Estadistica y Geografia (INEGI) and the Economic Classification Policy Committee (ECPC) of the United States. Each country maintains its own version of NAICS (NAICS Canada, NAICS U.S., and NAICS Mexico). The three country versions are generally the same with some differences found primarily in wholesale trade, retail trade and government, and at the 6-digit national industry level.

NAICS replaced the existing industry classification system used in Canada, which was the Standard Industrial Classification (SIC). Since then, NAICS Canada, U.S. and Mexico have been revised on a 5-year cycle in 2002, 2007, 2012, 2017 and 2022. The three NAICS partner agencies meet regularly to discuss possible changes to the common NAICS structure.

Canada has adopted a permanent "evergreen" practice with regards to NAICS, which means the updating of NAICS Canada on an as-needed basis, with version updates between the standard 5-year revision milestones, usually to adapt to exceptional circumstances if structural changes are approved. In fact, these "evergreen" updates strive to be constrained to specific situations or cases, e.g., in the cases of NAICS Canada 2017 Version 2.0 where changes were made to Internet publishing activities and NAICS Canada 2017 Version 3.0 where the classification was revised to account for new industries created after Canada has adopted a new law legalizing cannabis for non-medical use with impacts on the whole Canadian economy and society. These changes were approved based on a consensus following demands from data producers and users at Statistics Canada and externally. The intent remains to minimize revisions of the structure of NAICS between revision cycles, as we are trying to strike a balance between having a timely or relevant classification and maintaining historical data series, in particular from a National Accounts perspective.

We will continue to look at the best way to communicate to the public about revisions not affecting the structure or scope of NAICS categories between revision cycles (e.g., adding new activities to help for coding or identification of their placement in the classification, clarifying the texts/explanatory notes, etc.).

Nature and content of proposals

Respondents are invited to provide their comments, feedback, and suggestions on how to improve the NAICS content. They must outline their rationale for proposed changes.

No restrictions have been placed on content. Respondents may propose virtual (not affecting the meaning of a classification item) and real changes (affecting the meaning of a classification item, whether or not accompanied by changes in naming and/or coding). Examples of real changes, those that affect the scope of the classification items or categories (with or without a change in the codes), are: the creation of new classification items, the combination or decomposition of classification items, as well as the elimination of classification items. A classification item (sometimes referred to as a "class") represents a category at a certain level within a statistical classification structure. It defines the content and the borders of the category, and generally contains a code, title, definition/description, as well as exclusions where necessary. For NAICS, classifications items are: Sectors (2-digit), Subsectors (3-digit), Industry group (4-digit) and Industry (5-digit), and Canadian industry (6-digit).

Key dates for NAICS Canada 2027 revision process

Here are key dates for the NAICS Canada 2027 revision process:

  • Official public consultation period for changes proposed for inclusion in NAICS Canada 2027: Ongoing to the end of June 2025. Beyond 2027, the cut-off date to incorporate approved changes from proposals into the new classification version will be around a year and half before the release date of the next version of NAICS Canada based on the 5-year revision cycle.
  • Completion of trilateral negotiations: September 2025.
  • Public notice containing proposals in consideration for changes in NAICS Canada: November 2025.
  • Public notice containing the final approved proposal for changes in NAICS Canada: February 2026.
  • Public release of NAICS Canada 2027 Version 1.0: January 2027.

The next revised version of NAICS Canada will be called NAICS Canada 2027 Version 1.0.

Individuals and organizations wishing to submit proposals for changes in NAICS Canada may do so at any time, in accordance with the permanent consultation process adopted by Statistics Canada with regards to NAICS Canada.

Submitting Proposals

Proposals for NAICS Canada revisions must contain the contact information of those submitting the change request:

  1. Name
  2. Organization (when an individual is proposing changes on behalf of an organization)
  3. Mailing address
  4. Email address
  5. Phone number

Should additional information or clarification to the proposal be required, participants might be contacted.

Proposals must be submitted by email to statcan.naics-consultation-scian-consultation.statcan@statcan.gc.ca.

Consultation guidelines for submitting proposals for change in NAICS Canada

Individuals or organizations are encouraged to follow the guidelines below when developing their proposals.

Proposals should:

  • clearly identify the proposed addition or change to NAICS; this can include the creation of entirely new classes, or modifications to existing classes;
  • outline the rationale and include supporting information for the proposed change;
  • if possible, describe the empirical significance (i.e. revenue, expenses, value-added, employment) of proposed changes, and especially changes affecting the scope of existing classification items/categories;
  • new industries could be subjected to tests of empirical significance with respect to revenue, value added, employment, and number of establishments;
  • be consistent with classification principles (e.g., mutual exclusivity, exhaustiveness, and homogeneity within categories)
  • be relevant, that is
    • describe the present analytical interest;
    • enhance the usefulness of data;
    • base the proposal on appropriate statistical research or subject matter expertise.

Please consider the questions below when preparing your input for the consultation on the revision of NAICS Canada:

  • Are there socioeconomic activities for which you cannot find a satisfactory NAICS code?
  • Are there classification items that you find difficult to use because their descriptions are vague or unclear?
  • Are there pairs of classification items you find difficult to distinguish from each other? Are there boundaries that could be clarified?
  • Are there socioeconomic activities that you think should have their own NAICS category? Please indicate at which level and why, with the support documentation about the activities (see guidelines above for a proposal).
  • Are there activities that you are able to locate in NAICS, but you would like to have them located in a different sector or industry?
  • Is the language or terminology used in NAICS in need of updating to be consistent with current usage?

Note that submissions do not need to cover every topic; you can submit your comments on your particular area(s) of concern only.

The following criteria will be used to review the proposals received:

  • consistency with classification principles such as mutual exclusivity, exhaustiveness, and homogeneity of activities and output (products) within categories;
  • have empirical significance as an industry
  • data be collectable and publishable;
  • proposal can be linked to a funded program for data collection;
  • be relevant, that is, it must be of analytical interest, result in data useful to users, and be based on appropriate statistical research and subject-matter expertise;
  • be consistent with the Canadian System of National Accounts;
  • special attention will be given to specific industries, including:
    • new or emerging activities
    • activities related to new production processes.

NAICS Classification Structure

NAICS has a 6-digit, 5-level classification structure, consisting of 2-digit sectors, 3-digit sub-sectors, 4-digit industry groups, 5-digit industries and 6-digit national industries. Changes may be proposed for any level, but changes to the 2-digit to 5-digit levels will be subject to trilateral negotiation and approval. Changes to the 6-digit national industry level are at the discretion of each trilateral partner (i.e., Statistics Canada makes the final decision about changes to 6-digit industries in NAICS Canada).

North American Industry Classification System (NAICS) Canada 2022 Version 1.0 is the latest version of the classification for the participants of this consultation to base their input on. In the context of a permanent consultation process, persons or organizations proposing a change should always make sure they refer to the latest available version of NAICS Canada.

Costs associated with proposals

Statistics Canada will not reimburse respondents for expenses incurred in developing their proposal.

Treatment of proposals

Statistics Canada will review all proposals received. Statistics Canada reserves the right to use independent consultants or government employees, if deemed necessary, to assess proposals.

If deemed appropriate, a representative of Statistics Canada will contact respondents to ask additional questions or seek clarification on a particular aspect of their proposal.

Please note that a proposal will not necessarily result in changes to NAICS Canada.

Official languages

Proposals may be written in either of Canada's official languages – English or French.

Confidentiality

Statistics Canada is committed to respecting the privacy of consultation participants. All personal information created, held or collected by the Agency is protected by the Privacy Act. For more information on Statistics Canada's privacy policies, please consult the Privacy notice.

Thank You

We thank all participants for their continued interest and participation in the various NAICS engagement activities.

Enquiries

If you have any enquiries about this process, please send them to statcan.naics-consultation-scian-consultation.statcan@statcan.gc.ca.

National Travel Survey: C.V.s for Visit-Expenditures by Duration of Visit, Main Trip Purpose and Country or Region of Expenditures - Q1 2023

National Travel Survey: C.V.s for Visit-Expenditures by Duration of Visit, Main Trip Purpose and Country or Region of Expenditures, including expenditures at origin and those for air commercial transportation in Canada, in Thousands of Dollars (x 1,000)
Table summary
This table displays the results of C.V.s for Visit-Expenditures by Duration of Visit, Main Trip Purpose and Country or Region of Expenditures. The information is grouped by Duration of trip (appearing as row headers), Main Trip Purpose, Country or Region of Expenditures (Total, Canada, United States, Overseas) calculated using Visit-Expenditures in Thousands of Dollars (x 1,000) and c.v. as units of measure (appearing as column headers).
Duration of Visit Main Trip Purpose Country or Region of Expenditures
Total Canada United States Overseas
$ '000 C.V. $ '000 C.V. $ '000 C.V. $ '000 C.V.
Total Duration Total Main Trip Purpose 25,986,818 A 13,651,796 B 7,448,832 A 4,886,190 B
Holiday, leisure or recreation 14,369,721 A 5,582,368 A 5,150,065 A 3,637,288 B
Visit friends or relatives 5,499,091 B 3,902,696 C 800,488 B 795,907 B
Personal conference, convention or trade show 324,357 D 180,423 E 140,837 E 3,097 E
Shopping, non-routine 939,437 B 799,606 B 138,268 C 1,563 E
Other personal reasons 1,341,669 B 989,375 B 198,094 E 154,200 E
Business conference, convention or trade show 1,621,677 B 775,981 B 713,914 C 131,781 C
Other business 1,890,866 D 1,421,347 E 307,164 C 162,354 D
Same-Day Total Main Trip Purpose 5,200,264 B 4,834,771 B 331,748 C 33,745 D
Holiday, leisure or recreation 1,874,279 B 1,691,506 B 151,225 C 31,547 D
Visit friends or relatives 1,473,322 D 1,431,362 E 41,960 E ..  
Personal conference, convention or trade show 52,443 C 50,447 C 1,996 E ..  
Shopping, non-routine 823,829 B 723,496 B 100,333 C ..  
Other personal reasons 476,644 B 457,469 B 16,977 E 2,198 E
Business conference, convention or trade show 73,296 D 68,329 D 4,967 E ..  
Other business 426,451 E 412,162 E 14,290 E ..  
Overnight Total Main Trip Purpose 20,786,554 A 8,817,025 B 7,117,084 A 4,852,445 B
Holiday, leisure or recreation 12,495,442 A 3,890,862 A 4,998,840 B 3,605,741 B
Visit friends or relatives 4,025,769 B 2,471,334 B 758,527 B 795,907 B
Personal conference, convention or trade show 271,914 E 129,976 E 138,841 E 3,097 E
Shopping, non-routine 115,607 C 76,109 D 37,935 E 1,563 E
Other personal reasons 865,025 B 531,906 B 181,117 E 152,002 E
Business conference, convention or trade show 1,548,381 B 707,652 B 708,948 C 131,781 C
Other business 1,464,415 C 1,009,185 C 292,875 C 162,354 D
..
data not available

Estimates contained in this table have been assigned a letter to indicate their coefficient of variation (c.v.) (expressed as a percentage). The letter grades represent the following coefficients of variation:

A
c.v. between or equal to 0.00% and 5.00% and means Excellent.
B
c.v. between or equal to 5.01% and 15.00% and means Very good.
C
c.v. between or equal to 15.01% and 25.00% and means Good.
D
c.v. between or equal to 25.01% and 35.00% and means Acceptable.
E
c.v. greater than 35.00% and means Use with caution.

National Travel Survey: C.V.s for Person-Trips by Duration of Trip, Main Trip Purpose and Country or Region of Trip Destination - Q1 2023

National Travel Survey: C.V.s for Person-Trips by Duration of Trip, Main Trip Purpose and Country or Region of Trip Destination, Q1 2023
Table summary
This table displays the results of C.V.s for Person-Trips by Duration of Trip, Main Trip Purpose and Country or Region of Trip Destination. The information is grouped by Duration of trip (appearing as row headers), Main Trip Purpose, Country or Region of Trip Destination (Total, Canada, United States, Overseas) calculated using Person-Trips in Thousands (× 1,000) and C.V. as a units of measure (appearing as column headers).
Duration of Trip Main Trip Purpose Country or Region of Trip Destination
Total Canada United States Overseas
Person-Trips (x 1,000) C.V. Person-Trips (x 1,000) C.V. Person-Trips (x 1,000) C.V. Person-Trips (x 1,000) C.V.
Total Duration Total Main Trip Purpose 66,482 A 58,207 A 6,040 A 2,234 A
Holiday, leisure or recreation 23,290 A 18,738 A 3,018 A 1,533 A
Visit friends or relatives 26,594 B 24,786 B 1,291 B 517 B
Personal conference, convention or trade show 904 C 824 C 78 D 2 E
Shopping, non-routine 4,269 B 3,641 B 626 B 2 E
Other personal reasons 5,159 B 4,890 B 207 C 61 D
Business conference, convention or trade show 1,547 B 1,076 B 407 C 65 C
Other business 4,719 B 4,251 B 414 C 55 D
Same-Day Total Main Trip Purpose 42,594 A 40,756 A 1,838 B ..  
Holiday, leisure or recreation 13,039 A 12,431 A 609 C ..  
Visit friends or relatives 17,309 B 16,944 B 365 D ..  
Personal conference, convention or trade show 633 C 617 C 16 E ..  
Shopping, non-routine 4,036 B 3,448 B 588 C ..  
Other personal reasons 3,862 B 3,747 B 116 D ..  
Business conference, convention or trade show 446 C 437 C 9 E ..  
Other business 3,267 C 3,133 C 135 E ..  
Overnight Total Main Trip Purpose 23,888 A 17,452 A 4,202 A 2,234 A
Holiday, leisure or recreation 10,251 A 6,308 A 2,409 A 1,533 A
Visit friends or relatives 9,284 A 7,842 A 925 B 517 B
Personal conference, convention or trade show 271 C 208 C 62 E 2 E
Shopping, non-routine 232 C 193 D 37 D 2 E
Other personal reasons 1,296 B 1,144 B 91 D 61 D
Business conference, convention or trade show 1,101 B 639 B 398 C 65 C
Other business 1,452 B 1,118 B 279 C 55 D
..
data not available

Estimates contained in this table have been assigned a letter to indicate their coefficient of variation (c.v.) (expressed as a percentage). The letter grades represent the following coefficients of variation:

A
c.v. between or equal to 0.00% and 5.00% and means Excellent.
B
c.v. between or equal to 5.01% and 15.00% and means Very good.
C
c.v. between or equal to 15.01% and 25.00% and means Good.
D
c.v. between or equal to 25.01% and 35.00% and means Acceptable.
E
c.v. greater than 35.00% and means Use with caution.

Quarterly Survey of Financial Statements: Weighted Asset Response Rate - second quarter 2023

Weighted Asset Response Rate
Table summary
This table displays the results of Weighted Asset Response Rate. The information is grouped by Release date (appearing as row headers), 2023, Q2, Q3, and Q4, and 2023, Q1, Q2 calculated using percentage units of measure (appearing as column headers).
Release date 2022 2023
Q2 Q3 Q4 Q1 Q2
percentage
August 24, 2023 80.9 79.0 72.7 72.2 59.4
May 24, 2023 80.9 79.0 72.7 57.6  
February 23, 2023 79.2 76.9 55.2    
November 23, 2022 76.1 56.2      
August 25, 2022 55.7        
.. not available for a specific reference period
Source: Quarterly Survey of Financial Statements (2501)

Retail Trade Survey (Monthly): CVs for total sales by geography - June 2023

CVs for total sales by geography-June 2023
Geography Month
202306
%
Canada 0.6
Newfoundland and Labrador 2.0
Prince Edward Island 1.0
Nova Scotia 1.8
New Brunswick 2.3
Quebec 1.2
Ontario 1.1
Manitoba 1.2
Saskatchewan 2.6
Alberta 0.9
British Columbia 1.8
Yukon Territory 1.7
Northwest Territories 1.9
Nunavut 1.6

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography – June 2023

Monthly Survey of Food Services and Drinking Places: CVs for Total Sales by Geography - June 2023
Table summary
This table displays the results of CVs for Total sales by Geography. The information is grouped by Geography (appearing as row headers). Month and percentage (appearing as column headers).
Geography Month
202206 202207 202208 202209 202210 202211 202212 202301 202302 202303 202304 202305 202306
percentage
Canada 0.66 0.49 0.14 0.13 0.17 0.24 0.88 0.32 0.33 0.26 0.14 0.18 0.20
Newfoundland and Labrador 0.53 0.50 0.47 0.49 0.73 0.49 0.93 2.43 0.81 0.70 0.84 0.99 1.23
Prince Edward Island 15.97 9.23 5.27 3.04 8.45 8.22 3.45 10.49 14.17 8.25 7.86 2.25 3.07
Nova Scotia 1.79 3.37 0.43 0.40 0.37 0.43 16.87 0.83 0.91 0.72 0.58 0.70 0.72
New Brunswick 0.67 0.53 0.52 0.50 0.56 0.73 12.18 1.21 1.77 0.76 0.73 0.78 1.71
Quebec 1.55 0.97 0.18 0.28 0.26 0.19 1.73 0.67 0.95 0.77 0.33 0.53 0.59
Ontario 1.30 0.95 0.25 0.25 0.21 0.53 0.73 0.67 0.64 0.48 0.25 0.26 0.30
Manitoba 0.68 3.49 0.48 0.40 0.37 0.58 9.72 0.78 0.75 0.80 0.68 0.84 0.93
Saskatchewan 6.45 4.85 1.30 0.73 1.31 1.44 7.51 0.62 0.89 0.51 0.55 0.76 0.93
Alberta 1.45 0.91 0.39 0.30 0.33 0.38 1.56 0.40 0.44 0.36 0.33 0.37 0.48
British Columbia 0.64 0.91 0.28 0.21 0.66 0.33 2.77 0.44 0.44 0.38 0.27 0.36 0.38
Yukon Territory 3.32 2.54 2.09 2.07 2.34 2.20 2.50 41.12 2.70 30.75 2.48 8.17 4.07
Northwest Territories 3.20 2.74 2.38 2.05 2.00 2.09 2.56 6.03 2.47 38.31 3.64 10.11 3.58
Nunavut 1.55 1.52 1.30 2.35 2.85 101.77 43.21 2.83 2.61 2.50 2.47 23.47 2.74

Computer vision models: seed classification project

By AI Lab, Canadian Food Inspection Agency

Introduction

The AI Lab team at the Canadian Food Inspection Agency CFIA) is composed of a diverse group of experts, including data scientists, software developers, and graduate researchers, all working together to provide innovative solutions for the advancement of Canadian society. By collaborating with members from inter-departmental branches of government, the AI Lab leverages state-of-the-art machine learning algorithms to provide data-driven solutions to real-world problems and drive positive change.

At the CFIA's AI Lab, we harness the full potential of deep learning models. Our dedicated team of Data Scientists leverage the power of this transformative technology and develop customised solutions tailored to meet the specific needs of our clients.

In this article, we motivate the need for computer vision models for the automatic classification of seed species. We demonstrate how our custom models have achieved promising results using "real-world" seed images and describe our future directions for deploying a user-friendly SeedID application.

At the CFIA AI Lab, we strive not only to push the frontiers of science by leveraging cutting-edge models but also in rendering these services accessible to others and foster knowledge sharing, for the continuous advancement of our Canadian society.

Computer vision

To understand how image classification models work, we first define what exactly computer vision tasks aim to address.

What is computer vision:

Computer Vision models are fundamentally trying to solve what is mathematically referred to as ill-posed problems. They seek to answer the question: what gave rise to the image?

As humans, we do this naturally. When photons enter our eyes, our brain is able to process the different patterns of light enabling us to infer the physical world in front of us. In the context of computer vision, we are trying to replicate our innate human ability of visual perception through mathematical algorithms. Successful computer vision models could then be used to address questions related to:

  • Object categorisation: the ability to classify objects in an image scene or recognise someone's face in pictures
  • Scene and context categorisation: the ability to understand what is going in an image through its components (e.g. indoor/outdoor, traffic/no traffic, etc.)
  • Qualitative spatial information: the ability to qualitatively describe objects in an image, such as a rigid moving object (e.g. bus), a non-rigid moving object (e.g. flag), a vertical/horizontal/slanted object, etc.

Yet, while these appear to be simple tasks, computers still have difficulties in accurately interpreting and understanding our complex world.

Why is computer vision so hard:

To understand why computers seemingly struggle to perform these tasks, we must first consider what an image is.

Figure 1

Are you able to describe what this image is from these values?

Description - Figure 1

This image shows a brown and white pixelated image of a person’s face. The person's face is pixelated, with the pixels being white and the background being brown. Next to the image, there's a zoomed in image showing the pixel values corresponding to a small patch of the original image.

An image is a set of numbers, with typically three colour channels: Red, Green, Blue. In order to derive any meaning from these values, the computer must perform what is known as image reconstruction. In its most simplified form, we can mathematically express this idea through an inverse function:

x = F-1(y)

Where:

y represents data measurements (ie. pixel values).
x represents a reconstructed version of measurements, y, into an image.

However, it turns out solving this inverse problem is harder than expected due to its ill-posed nature.

What is an ill-posed problem

When an image is registered, there is an inherent loss of information as the 3D world gets projected onto a 2D plane. Even for us, collapsing the spatial information we get from the physical world can make it difficult to discern what we are looking at through photos.

Figure 2

Michelangelo (1475-1564). Occlusion caused by different viewpoints can make it difficult to recognise the same person.

Description - Figure 2

The image shows three paintings of different figures, each with a different expression on their faces. One figure appears to be in deep thought, while the other two appear to be in a state of contemplation. The paintings are made of a dark, rough material, and the details of their faces are well-defined. The overall effect of the image is one of depth and complexity. The paintings are rotated in each frame to create a sense of change.

Figure 3

Bottom of soda cans. Different orientations can make it impossible to identify what is contained in the can.

Description - Figure 3

The image shows five metal cans, four of them with a different patch of color on the lid. The colors are blue, green, red, and yellow. The cans are arranged on a countertop. The countertop is made of a dark surface, such as granite or concrete.

Figure 4

Yale Database of faces. Variations in lighting can make it difficult to recognise the same person (recall: all computers “see” are pixel values).

Description - Figure 4

The image shows two images of the same face. The images are captured from different angles, resulting in two different perceived expressions of the face. On the left frame the man a neutral facial expression, whereas on the right frame he has a serious and angry expression.

Figure 5

Rick Scuteri-USA TODAY Sports. Different scales can make it difficult to understand context from images.

Description - Figure 5

The image shows four different images, at different scales. The first images contain only what looks like the eye of a bird. The second image contains the head and neck of a goose. The third image shows the entire animal, and the fourth image shows a man standing in front of the bird pointing in a direction.

Figure 6

Different photos of chairs. Intra-class variation can make it difficult to categorise objects (we can discern a chair through its functional aspect)

Description -Figure 6

The image shows 5 different chairs. The first one is a red chair with a wooden frame. The second one is a black leather swivel chair. The third looks like an unconventional artistic chair. The fourth one looks like a minimalist office chair, and the last one looks like a bench.

It can be difficult to recognise objects in 2D pictures due to possible ill-posed properties, such as:

  • Lack of uniqueness: Several objects can give rise to the same measurement.
  • Uncertainty: Noise (e.g. blurring, pixilation, physical damage) in photos can make it difficult or impossible to reconstruct and identify an image.
  • Inconsistency: slight changes in images (e.g. different viewpoints, different lighting, different scales, etc.) can make it challenging to solve for the solution, x, from available data points, y.

While computer vision tasks may, at first glance, appear superficial, the underlying problem they are trying to address is quite challenging!

Now we will address some Deep Learning driven solutions to tackle computer vision problems.

Convolutional Neural Networks (CNNs)

Figure 7

Graphical representation of a convolutional neural network (CNN) architecture for image recognition. (Hoeser and Kuenzer, 2020)

Description - Figure 7

This is a diagram of a convolutional neural network (ConvNet) architecture. The network consists of several layers, including an input layer, a convolutional layer, a pooling layer, and an output layer. The input layer takes in an image and passes it through the convolutional layer, which applies a set of filters to the image to extract features. The pooling layer reduces the size of the image by applying a pooling operation to the output of the convolutional layer. The output layer processes the image and produces a final output. The network is trained using a dataset of images and their corresponding labels.

Convolutional Neural Networks (CNNs) are a type of algorithm that has been really successful in solving many computer vision problems, as previously described. In order to classify or identify objects in images, a CNN model first learns to recognize simple features in the images, such as edges, corners, and textures. It does this by applying different filters to the image. These filters help the network focus on specific patterns. As the model learns, it starts recognizing more complex features and combines the simple features it learned in the previous step to create more abstract and meaningful representations. Finally, the CNN takes the learned features and to classify images based on the classes it's been trained with.

Figure 8

Evolution of CNN architectures and their accuracy, for image recognition tasks from 2012 to 2019. (Hoeser and Kuenzer, 2020).

Description - Figure 8

The image shows the plot of the size of different CNN architectures and models from the year 2012 until 2019. Each neural network is depicted as a circle, with the size of the circle corresponding to the size of the neural network in terms of number of parameters.

The first CNN was first proposed by Yann LeCun in 1989 (LeCun, 1989) for the recognition of handwritten digits. Since then, CNNs have evolved significantly over the years, driven by advancements in both model architecture and available computing power. To this day, CNNs continue to prove themselves are powerful architectures for various recognition and data analysis tasks.

Vision Transformers (ViTs)

Vision Transformers (ViTs) are a recent development in the field of computer vision that apply the concept of transformers, originally designed for natural language processing tasks, to visual data. Instead of treating an image as a 2D object, Vision Transformers view an image as a sequence of patches, similar to how transformers treat a sentence as a sequence of words.

Figure 9

An overview of a ViT as illustrated in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Since the publication of the original ViT, numerous variations and flavours have been proposed and studied.

Description - Figure 9

The image shows the diagram of the ViT architecture. There is an image of the input image, being splitted into different patches, and each patch is fed into the neural network. The network consists of a transformer encoder block and an MLP Head block, followed by a classification head.

The process starts by splitting an image into a grid of patches. Each patch is then flattened into a sequence of pixel vectors. Positional encodings are added to retain the positional information, as is done in transformers for language tasks. The transformed input is then processed through multiple layers of transformer encoders to create a model capable of understanding complex visual data.

Just as Convolutional Neural Networks (CNNs) learn to identify patterns and features in an image through convolutional layers, Vision Transformers identify patterns by focusing on the relationships between patches in an image. They essentially learn to weigh the importance of different patches in relation to others to make accurate classifications. The ViT model was first introduced by Google's Brain team in a paper in 2020. While CNNs dominated the field of computer vision for years, the introduction of Vision Transformers demonstrated that methods developed for natural language processing could also be used for image classification tasks, often with superior results.

One significant advantage of Vision Transformers is that, unlike CNNs, they do not have a built-in assumption of spatial locality and shift invariance. This means they are better suited for tasks where global understanding of an image is required, or where small shifts can drastically change the meaning of an image.

However, ViTs typically require a larger amount of data and compute resources compared to CNNs. This factor has led to a trend of hybrid models that combine both CNNs and transformers to harness the strengths of both architectures.

Seed classification

Background:

Canada's multi-billion seed and grain industry has established a global reputation in the production, processing, and exportation of premium-grade seeds for planting or grains for food across a diverse range of crops. Its success is achieved through Canada's commitment to innovation and the development of advanced technologies, allowing for the delivery of high-quality products with national and international standards with diagnostic certification that meet both international and domestic needs.

Naturally, a collaboration between a research group from the Seed Science and Technology Section and the AI Lab of CFIA was formed to maintain Canada's role as a reputable leader in the global seed or grain and their associated testing industries.

Background: Quality Control

The seed quality of a crop is reflected in a grading report, whereby the final grade reflects how well a seed lot conforms with Canada's Seeds Regulations to meet minimum quality standards. Factors used to determine crop quality include contaminated weed seeds according to Canada's Weed Seeds Order, purity analysis, and germination and disease. While germination provides potentials of field performance, assessing content of physical purity is essential in ensuring that the crop contains a high amount of the desired seeds and is free from contaminants, such as prohibited and regulated species, other crop seeds, or other weed seeds. Seed inspection plays an important role in preventing the spread of prohibited and regulated species listed in the Canadian Weed Seeds Order. Canada is one of the biggest production bases for global food supply, exporting huge number of grains such as wheat, canola, lentils, and flax. To meet the Phyto certification requirement and be able to access wide foreign markets, analyzing regulated weed seeds for importing destinations is in high demand with quick turnaround time and frequent changes. Testing capacity for weeds seeds requires the support of advanced technologies since the traditional methods are facing a great challenge under the demands.

Motivation

Presently, the evaluation of a crop's quality is done manually by human experts. However, this process is tedious and time consuming. At the AI Lab, we leverage advanced computer vision models to automatically classify seed species from images, rendering this process more efficient and reliable.

This project aims to develop and deploy a powerful computer vision pipeline for seed species classification. By automating this classification process, we are able to streamline and accelerate the assessment of crop quality. We develop upon advanced algorithms and deep learning techniques, while ensuring an unbiased and efficient evaluation of crop quality, paving the way for improved agricultural practices.

Project #1: Multispectral Imaging and Analysis

In this project, we employ a custom computer vision model to assess content purity, by identifying and classifying desired seed species from undesired seed species.

We successfully recover and identify the contamination by three different weed species in a screening mixture of wheat samples.

Our model is customised to accept unique high resolution, 19-channel multi-spectral image inputs and achieves greater than 95% accuracy on held out testing data.

We further explored our model's potential to classify new species, by injecting five new canola species into the dataset and observing similar results. These encouraging findings highlight our model's potential for continual use even as new seed species are introduced.

Our model was trained to classify the following species:

  • Three different thistles (weed) species:
    • Cirsium arvense (regulated species)
    • Carduus nutans (Similar to the regulated species)
    • Cirsium vulgare (Similar to the regulated species)
  • Six Crop seeds,
    • Triticum aestivum subspecies aestivum
    • Brassica napus subspecies napus
    • Brassica juncea
    • Brassica juncea (yellow type)
    • Brassica rapa subspecies oleifera
    • Brassica rapa subspecies oleifera (brown type)

Our model was able to correctly identify each seed species with an accuracy of over 95%.

Moreover, when the three thistle seeds were integrated with the wheat screening, the model achieved an average accuracy of 99.64% across 360 seeds. This demonstrated the model's robustness and ability to classify new images.

Finally, we introduced five new canola species and types and evaluated our model's performance. Preliminary results from this experiment showed a ~93% accuracy on the testing data.

Project #2: Digital Microscope RGB Imaging and Analysis

In this project, we employ a 2-step process to identify a total of 15 different seed species with regulatory significance and morphological challenge across varying magnification levels.

First, a seed segmentation model is used to identify each instance of a seed in the image. Then, a classification model classifies each seed species instance.

We perform multiple ablation studies by training on one magnification profile then testing on seeds coming from a different magnification set. We show promising preliminary results of over 90% accuracy across magnification levels.

Three different magnification levels were provided for the following 15 species:

  • Ambrosia artemisiifolia
  • Ambrosia trifida
  • Ambrosia psilostachya
  • Brassica junsea
  • Brassica napus
  • Bromus hordeaceus
  • Bromus japonicus
  • Bromus secalinus
  • Carduus nutans
  • Cirsium arvense
  • Cirsium vulgare
  • Lolium temulentum
  • Solanum carolinense
  • Solanum nigrum
  • Solanum rostratum

A mix of 15 different species were taken at varying magnification levels. The magnification level was denoted by the total number of instances of seeds present in the image, either: 1, 2, 6, 8, or 15 seeds per image.

In order to establish a standardised image registration protocol, we independently trained separate models from a subset of data at each magnification then evaluated the model performance across a reserved test set for all magnification levels.

Preliminary results demonstrated the model's ability to correctly identify seed species across magnifications with over 90% accuracy.

This revealed the model's potential to accurately classify previously unseen data at varying magnification levels.

Throughout our experiments, we tried and tested out different methodologies and models.

Advanced models equipped with a canonical form such as Swin Transformers fared much better and proved to be less perturbed by the magnification and zoom level.

Discussion + Challenges

Automatic seed classification is a challenging task. Training a machine learning model to classify seeds poses several challenges due to the inherent heterogeneity within and between different species. Consequently, large datasets are required to effectively train a model to learn species-specific features. Additionally, the high degree of similarity among different species within genera for some of them makes it challenging for even human experts to differentiate between closely related intra-genus species. Furthermore, the quality of image acquisition can also impact the performance of seed classification models, as low-quality images can result in the loss of important information necessary for accurate classification.

To address these challenges and improve model robustness, data augmentation techniques were performed as part of the preprocessing steps. Affine transformations, such as scaling and translating images, were used to increase the sample size, while adding Gaussian noise can increase variation and improve generalization on unseen data, preventing overfitting on the training data.

Selecting the appropriate model architecture was crucial in achieving the desired outcome. A model may fail to produce accurate results if end users do not adhere to a standardized protocol, particularly when given data that falls outside the expected distribution. Therefore, it was imperative to consider various data sources and utilize a model that can effectively generalize across domains to ensure accurate seed classification.

Conclusion

The seed classification project is an example of the successful and ongoing collaboration between the AI Lab and the Seed Science group at the CFIA. By pooling their respective knowledge and expertise, both teams contribute to the advancement of Canada's seed and grain industries. The seed classification project showcases how leveraging advanced machine learning tools has the potential to significantly enhance the accuracy and efficiency of evaluating seed or grain quality with compliance of Seed or Plant Protection regulations, ultimately benefiting both the agricultural industry, consumers, Canadian biosecurity, and food safety.

As Data Scientists, we recognise the importance of open-source collaboration, and we are committed to upholding the principles of open science. Our objective is to promote transparency and engagement through open sharing with the public.

By making our application available, we invite fellow researchers, seed experts, and developers to contribute to its further improvement and customisation. This collaborative approach fosters innovation, allowing the community to collectively enhance the capabilities of the SeedID application and address specific domain requirements.

Meet the Data Scientist

If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.

Register for the Meet the Data Scientist event. We hope to see you there!

MS Teams – link will be provided to the registrants by email

Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.

Date modified: