Data Stewardship: An introduction to data standards and metadata

Catalogue number: 892000062021006

Release date: May 3, 2021 Updated: November 5, 2021

Whether you're gathering new data or using existing data, applying data standards will make your life easier. And documenting your data in metadata will ensure that others will be able to find it, understand it, and use it.

In this video, you'll learn what we can do to data itself, to make it easier to work with. That's the role of data standards. And you'll learn what extra information we can provide to make data easier to use. That's the role of metadata.

Data journey step
Foundation
Data competency
  • Data analysis
  • Data interpretation
  • Data management and organization
  • Data stewardship
  • Metadata creation and use
Audience
Basic
Suggested prerequisites
N/A
Length
13:14
Cost
Free

Watch the video

Data Stewardship: An introduction to data standards and metadata - Transcript

Data Stewardship: An introduction to data standards and metadata - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Data Stewardship: An introduction to data standards and metadata")

Data Stewardship: An introduction to data standards and metadata

Whether you're gathering new data or using existing data, applying data standards will make your life easier, and documenting your data in metadata will ensure that others will be able to find it, understand it, and use it.

Learning goals

In this video, you'll learn what we can do to data itself to make it easier to work with. That's the role of data standards. And you'll learn what extra information we can provide to make data easier to use. That's the role of metadata.

Steps of a data journey

(Text on screen: Supported by a foundation of stewardship, metadata, standards and quality)

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

This diagram is a visual representation of the data journey from collecting the data to exploring, cleaning, describing and understanding the data to analyzing the data, and lastly to communicating with others the story the data tell. Data standards and metadata support the entire data journey.

Data standards

Data standards can be applied to data to make it easier to work with. One example of a data standard is using common terminology to describe data. Such as saying that the data is in a data set or that the data has categorical variables or numeric variables. Another example is using familiar ways to represent common things, such as for dates, addresses, and temperature. Making data recognizable by keeping it in a database or using a common file format is another example of following a data standard. In a nutshell, data standards are the rules used to standardize the way data are described, represented and structured.

Metadata

Metadata is information provided with data to make it easier to use. An example of metadata is information about the source of the data. Where did it come from? Who made it? What's it for? When was it created? That sort of thing. Another example of metadata is information about what has already been done to the data, how was it cleaned, how was it processed? How were things calculated? Metadata can also include a description of the quality of the data. For example, what's missing? How is it validated? If data standards were followed in the creation and processing of the data, a description of them can be included in the metadata as well. In a nutshell, metadata is data that provides information about other data, making it easier to find, interpret, trust, and use.

Data standards and metadata

(Diagram of the life cycle of data symbolised as a road starting at the data producer where it is for further analysis at each juction, as long as rule behind data standards and metadata are followed. If data standards and metadata are used, the data life cycle is endless.)

We gather and manipulate data because we want it to reveal something. Usually we don't gather, manipulate and interpret data all in one step. Rather, data moves along like the analogy of the data journey. But the data journey can be even longer than that. The endpoint for one data journey could be the starting point for another. Your job might be to create a data table for your boss. And, your boss might put that together with other tables to create a dashboard. And, their boss might look at the dashboard and recommended decision and so on. The data just keeps moving like a bicycle. But what keeps it moving? Somebody has to pedal. And in the case of data, somebody has to be able to find it, understand it and do their own manipulations on it to keep it rolling forward. That's where data standards come in. Just like the standard way of pedaling a bike is 2 feet on the pedals and rotate in a clockwise direction, there are standards for using data. And just like there's product information sheet describing the features of a bicycle, there's metadata to explain what's in the data and how it works. As long as data standards are followed and metadata is kept up-to-date, the data can keep rolling along. But as soon as the data standards are not followed and there's no metadata, it's the end of the road for the data journey. Nobody understands it or can use it anymore.

Why data standards and metadata are important

But why do we need to use data standards and metadata? The importance of these two things can be remembered using the acronym FAIR. Data standards and metadata make data findable or easily searchable. They make data accessible or easy to use. They make data interoperable, which simply means easy to combine with other data. And they make it easy to use, share and reuse the data.

Types of data standards

There are different types of data standards. We'll just mention a few of them here. Data format standards are a standardized way to represent things such as dates, negative numbers, currency, and in Canada, we have two letter codes for province names. Data file format standards ensure that files are easy to share and open. Comma separated variable or CSV format is a good one because files can be opened by a variety of different software packages. Variable standards are a way to standardize the categories and structures for variables that can only take on certain values, such as employment status, age groups, industry and occupation, and products. We'll see more about these on the next slide.

Standard classifications

A classification is a way to group the categories of a standardized variable in a meaningful and consistent way. Let's take age groups for example. If one researcher uses age categories that span 4 years, and another researcher uses age categories that span 10 years, it would be hard to compare the results. However, if everyone agrees that age categories should span 5 years and we all abide by that, we call this a classification. The groupings in the classification should be, among other things, exhaustive and mutually exclusive. Before you make your own groupings, it's a good practice to check first if there's a standard classification for you to use. There are many on Statistics. Canada's website.

Types of metadata

There are different types of metadata, but we'll just talk about a few of them here. Reference metadata gives information about the source of the data, such as who collected it. When? And for what purpose? Reference metadata also includes a description of the methods used to process or analyze the data, and an assessment of the data quality. This could be in paragraph format. Descriptive metadata is things like titles, footnotes and labels. These could appear directly on the tables, graphs, and other data visualizations. Structural metadata is where one would find a list of what variables and classifications are in the data. What identifiers there are? What are the valid values? What is the range of values or the code list? Which is a list of possible values for categorical variables. If standard classifications were used, they would be described in the structural metadata.

Example: Climate normals, 1981-2010, Saskatoon

(Image of website research result of the "1981 - 2010 climate normals and averages" on the Government of Canada website with an emphasis on the Saskatoon Water TP hyperlink.)

Let's look at data standards and metadata through an example. You can go to this web page on the Government of Canada website by navigating to climate normals and averages and searching for a station name containing Saskatoon. We chose the weather station called Saskatoon Water TP.

(Image of website research result of the "1981 - 2010 climate normals and averages for the Saskatoon Water TP" on the Government of Canada website with an emphasis on the graph's title, axes and legend. Additional attention is put on the Station/Element Metadata tab.)

Clicking on the Saskatoon Water TP station brought us here. This graph has an informative title. A legend and the axes are labeled. These are examples of descriptive metadata. Above the graph for three tabs, we clicked on the one for "Station/Element Metadata."

(Image of website research result of the "1981 - 2010 climate normals and averages for the Saskatoon Water TP" on the Government of Canada website with an emphasis on the Latitude (dd mm) hyperlink.)

Clicking on this "Station/ Element Metadata" tab brought us here. The paragraph below the tabs is an example of reference metadata explaining how to assess the quality of the statistics in the table. The data producer provided this information to help the data user decide if the data is fit for their intended use. Next we clicked on the latitude hyperlink.

(Image of the metadata definition of latitude)

Clicking the latitude hyperlink brought us here. This is more reference metadata to help the user understand how latitude was measured and also to inform the user about the quality of the location data. Latitude and longitude are a standard classification for communicating locations.

(Image of website research result of the "1981 - 2010 climate normals and averages for the Saskatoon Water TP" on the Government of Canada website with an emphasis on the Normals Data tab.)

Next we return to this web page and clicked on the Normals Data tab.

(Image of website research result of the "1981 - 2010 climate normals and averages for the Saskatoon Water TP" on the Government of Canada website with an emphasis on the Normals Data tab subsection titled "Download Data" & "Related Data".)

Clicking the Normals Data tab brought us here. The paragraph below the tabs is more reference metadata describing how calculations were done and more insight about data quality. The data is available for free in CSV and XML format, both of which are common data file format standards. Also on this webpage are hyperlinks to other related metadata.

(Image of CSV file from the "1981 - 2010 climate normals and averages for the Saskatoon Water TP" website from the Government of Canada website withan emphasis on CSV table titles, standardized province abbreviations, numbering methodolgies and date nomenclature.)

We downloaded the data in CSV or comma separated values format. This is a small segment of it. The left hand column explicitly states that it includes metadata to help users understand and use this microdata. The province is SK, which is the standard abbreviation for Saskatchewan. Negative values are indicated with a minus sign, which is a common data standard. Dates are formatted Year Year Year Year Slash Day Day, which is a somewhat common data standard for dates.

Example: Recap

Now let's reveal what we saw in the example. The data producer gathered the climate microdata and did some processing applied data standards for dates and negative numbers, and used a standard classification for latitude and longitude. From the microdata they produced dated products. In this example, we saw a chart, a graph, and a downloadable data set. The data producer also created metadata describing everything they did and strategically chose where to display the metadata in the data products so that it would be intuitive and transparent to the data user where to find the information they need. The data user in this example navigates through the data products from the metadata. The data user learns about what data quality methods used and data standards were applied.

Knowledge check

(Image of Canada's new motor vehicle registrations from the Government of Canada. With an emphasis on column titles, superscripts, and references.)

The example walked us through how to spot metadata and data standards. Now it's your turn. What you see on the screen is a data table of new motor vehicle registrations. Pause the video here and see how many examples of metadata you can see on the screen. Press play to see our answers. The table has labels on the rows and columns. This is an example of descriptive metadata. The font is pretty small. But there's a superscript on the words "other fuel types". That is a hyperlink. If you could press it, you'd see that it makes a pop-up message appear listing other fuel types. This is another example of descriptive metadata. Below the table is a goldmine of related information. If you were able to click on any of the hyperlinks there you'd find referenced metadata about the data source, methods, and quality.

Recap of key points

Data standards are rules used to standardize the way data are described, represented and structured. Data standards make data easier to work with. Metadata is data that provides information about other data. Metadata makes data easier to use. Using and sharing data standards and metadata facilitates using and sharing data.

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.