This video will break down what it means to be FAIR in terms of data and metadata, and how each pillar of FAIR serves to guide users and producers alike, as they navigate their way through the data journey, in order to gain maximum, long term value.
This diagram is a visual representation of the data journey from collecting the data to exploring, cleaning, describing and understanding the data. To analyzing the data and lastly to communicating with others the story the data tell.
FAIR data means data that are: Findable, that unique identifiers and metadata are used to help locate data quickly and efficiently.
It also means the data are Accessible, that they are available with the appropriate permissions and that metadata are freely available and can be accessed in a standardized way.
FAIR data are also Interoperable, in that by using standards, machine-readable data are exchanged and yield outputs for use in a readable and useful format.
All to ensure the data are Reusable. That metadata exist to describe the source, origin and destination of data and their usages in a standardized way, enabling the meaningful reuse of data over time and across disciplines. Let’s break that down a little…
The ultimate goal of FAIR is to use these principles as a set of guidelines for anyone wishing to enhance the reusability of their data. This is done by ensuring the data are Findable, accessible, interoperable and reusable.
Data and metadata that include unique identifiers help us search data catalogues to find information. For example, something as simple as “current weather in Whitehorse”, when typed into an internet search engine will yield multiple URLs. These URLs, or webpage links, are each made up of a string of unique identifiers which have been registered in the search engine’s data catalogue. And as a result, when clicked, these URLs will bring you to where you need to be in order to find the information you are looking for.
Once you have found your desired data through that unique identifier, in this case, the URL that offers to show you the weather in Whitehorse, you need to access them. Sometimes sources are freely available and sometimes, when you click on a link, you might be asked for the appropriate permissions, such as a user name and/or password. In the event you do not have the appropriate permissions, information or metadata should be freely available to explain to you what the data contain and how data might be accessed.
After you have access to the data, in this case, the current weather in Whitehorse, you might be interested to see if today’s weather is on par with previous years, or if it is currently colder or warmer than average. For that, you might want access to a file that possesses historical data. The way in which that file - located at point A - is formatted, must be understood and readable in order to be used by point B - your personal computer. This requires the exchange and interpretation of machine- readable information.
In order to feel comfortable reusing data, you need to know the origins of the data or where they came from, where they have been, and how they have been used in the past. This is called Provenance. Provenance is information about the source of the data ( there could be more than one) relative to where you are within a particular process. For example, if you are tasked with one step in the process, then provenance could be the list of all the people or machines that handled or manipulated the data before you. Then lineage would list all the transformations that occurred throughout those processes, like which records have been changed and how, which variables have been renamed, etc. Together, provenance and lineage help understand how the data came to be in their current form.
Metadata containing rich descriptions of provenance and lineage help to encourage:
Together, provenance and lineage provide the complete traceability of where data have resided and what processes have been performed on them over the course of their life, making them easier and safer to reuse.
So, back to our example of historical weather data for Whitehorse. First, you found the data, accessed them and then used them on your device of choice. Rich descriptions of the data that include information on how the data have been transformed and any data usage licensing now provide you with the needed information to combine these data with other data in order to reuse them based on your needs. Meaning, after accessing historical data for other cities, over a certain time frame, you can rank and compare Whitehorse to a set of other cities, in terms of being colder or warmer than average this year.
Now that the video is almost over, time for a knowledge check! How much do you remember about FAIR data? I’ll read the question aloud. Then after, pause the video while you make your selection. APIs (Application Programming Interfaces), which allow one piece of software to freely and openly communicate data with another, are an example of which FAIR principle that are…
The correct answer is 3 – Interoperability. APIs are an example of interoperability in that they facilitate the exchange and interpretation of machine-readable information from point A to point B.
FAIR data principles are important because they can be used as a guideline for anyone wishing to enhance the reusability of their data or wishing to develop a new reusable data product.
This video will breakdown what it
means to be FAIR in terms of data and
metadata and how each pillar of FAIR
serves to guide users and producers
alike as they navigate their way
through the data journey in order
to gain maximum long term value.
In this video, you'll learn the
answers to the following questions.
What are FAIR data principles and why
are FAIR data principles important?
This diagram is a visual representation
of the data journey from collecting
the data to exploring, cleaning,
describing and understanding
the data to analyzing the data,
and lastly to communicating with
others the story the data tell.
FAIR data principles are relevant
throughout every step of the data journey.
FAIR data means data that are
findable that unique identifiers
and metadata are used to help locate
data quickly and efficiently.
It also means the data are accessible.
That they are available with the
appropriate permissions and that
metadata are freely available and can
be accessed in a standardized way.
FAIR data are interoperable and that by
using standards machine readable data
are exchanged and yield outputs for
use in a readable and useful format.
This is all to ensure that the
data are reusable.
That metadata exist to describe the source,
origin and destination of data,
and the usages in a standardized way,
enabling the meaningful reuse of data
over time and across disciplines.
Let's break that down a little.
The ultimate goal of FAIR is to
use these principles as a set of
guidelines for anyone wishing to
enhance the reusability of their data.
This is done by ensuring the
data are findable, accessible,
interoperable, and reusable.
Or in other words, FAIR.
Data and metadata that include
unique identifiers help us search
data catalogs to find information.
For example. Something as simple as
current weather and Whitehorse when
typed into an Internet search engine,
will yield multiple URLs.
These URLs or web page links are
each made up of a string of unique
identifiers which have been registered
in the search engines data catalog.
As a result, when clicked,
these URLs will bring you where
you need to be in order to find the
information you are looking for.
Once you have found your desired
data through that unique identifier,
in this case the URL that offers
to show you the weather and Whitehorse
you need to access them.
Sometimes sources are freely available
and sometimes when you click on a link,
you might be asked for the appropriate
permissions such as username,
password, etc.
In the event that you do not
have the appropriate permissions
information or metadata should
still be freely available to explain
to you what the data contain or
how the data might be accessed.
After you have access to the data,
in this case the current
weather and Whitehorse.
You might be interested to see
if today's weather is on par with
previous years or if it is currently
colder or warmer than average.
For that, you might want to access a
file that possesses historical data.
The way in which that file located at point
A is formatted must be understood and
readable in order to be used by point B,
your personal computer.
This requires the exchange and interpretation
of machine readable information.
Machine readable information includes
the use of standardized vocabularies to
provide a consistent way of describing data,
such as geographic names or numerical codes.
They include formats and applications
such as HTML, CSV, JSON and others.
As well as APIs which stands for
application programming interfaces,
these allow one piece of software to freely
and openly communicate with another.
In order to feel comfortable reusing data,
you need to know the origins of
the data or where they came from,
where they've been,
how they've been used in the past.
This is called provenance.
Provenance is information about
the source of the data.
There could be more than one.
Relative to where you are
within a particular process,
for example.
If you're tasked with one step
in the process,
then provenance could be the list of
all the people or machines that handled
or manipulated the data before you.
Then lineage would list all the
transformations that occurred
throughout those processes,
like which records have been changed and how,
which variables have been renamed, etc.
Together,
provenance and lineage help
understand how the data came to
be in their current form.
Metadata containing rich descriptions
of provenance and lineage.
Help to encourage the understanding
of where data have come from and
what methodologies have been
employed to produce them.
Metadata also aid in understanding
the quality of a final product
or the pedigree of its sources.
By detailing its relevance,
completeness, accuracy,
reputation and integrity.
Together,
provenance and lineage provide the
complete traceability of where data
have resided and what processes
have been performed on them over
the course of their life,
making them easier and safer to reuse.
So back to our example of historical
weather data for Whitehorse.
First you found the data access them.
And then use them on your device of choice.
Rich descriptions of the data that include
information on how the data have been
transformed and any data usage licensing
now provide you with the needed information.
To combine these data with other data in
order to reuse them based on your needs.
Meaning, after accessing historical data
for other cities over a certain timeframe,
you can rank and compare Whitehorse to
a set of other cities in terms of being
colder or warmer than average this year.
Now that the video is almost over,
it's time for a knowledge check.
How much do you remember about FAIR data?
I'll read the question aloud,
then afterwards pause the video
while you make your selection.
Here we go.
APIs or application programming
interfaces which allow one piece
of software to freely and openly
communicate data with another are
an example of which FAIR principle.
Findable.
Accessible.
Interoperable reusable the correct
answer is 3 interoperability.
APIs are an example of interoperability
in that they facilitate the exchange
and interpretation of machine readable
information from point A to point B.
FAIR data principles ensure that
data are findable, accessible,
interoperable, and reusable.
Their data principles are important
because they can be used as a guideline
for anyone wishing to enhance the
reusability of their data or wishing to
develop a new reusable data product.