Gathering, exploring, analyzing and interpreting data are essential steps in producing information that benefits society, the economy and the environment. To properly conduct these processes, data ethics must be upheld in order to ensure the appropriate use of data.
In this video, you will be introduced to data ethics, why they are important, and the 6 guiding principles of data ethics implemented by Statistics Canada, throughout the Data Journey.
This diagram is a visual representation of the data journey from collecting the data; to exploring, cleaning, describing and understanding the data; to analyzing the data; and lastly to communicating with others the story the data tell.
Data ethics are relevant throughout all steps of the data journey.
So what are data ethics exactly? Data Ethics allow data users to address questions about the appropriate use of data throughout all steps of the data journey.
This field of study is used to ensure collected data always have a specific purpose, and that each new project or data acquisition has the best interests of both society and the individual at heart.
With the rapid growth of data associated with the digital age, data gathering approaches have also evolved.
Along with the more traditional survey-based approach, some alternative data gathering methods include:
These data are then used to create useful information such as statistics, and to train algorithms for artificial intelligence and machine learning. But with big data comes big responsibility…
When deciding to embrace such evolving data gathering methods as administrative sourcing, web scraping, apps and crowdsourcing, there is a responsibility to maintain focus on such perennial ethical challenges as:
There are many ways to address these ethical challenges, at Statistics Canada, we use the following 6 guiding principles:
Let's look at these principles in more detail.
Benefits to society means that statistical activities must allow governments, businesses and communities to make informed decisions and manage resources effectively, ultimately aiming to clearly benefit the lives of Canadians.
A census of population is fundamental to any country's statistical infrastructure. In Canada, the census is currently the only data source that provides high-quality population and dwelling counts based on common standards and at low levels of geography, as well as consistent and comparable information on various population groups.
When statistical activities require personal information, the consideration of both privacy and security is mandatory. The appropriate measures must always be taken in order to protect personal information while still ensuring the data can be used to create meaningful information.
Firstly, there is a fine balance between respecting privacy and producing information. Projects that intrude into the private lives of Canadians must justify why this information is important enough to warrant this intrusion, and be able to explain how using this data will ultimately provide benefits. In other words, we must ensure that our statistical activities are not intruding into the lives of Canadians any more than necessary, and to always justify whatever intrusion we consider necessary.
Furthermore, when designing a data-gathering approach, we have a moral obligation to protect the confidentiality and data of Canadians. Part of the data ethics exercise also consists in ensuring that projects have considered potential security threats and have prepared accordingly.
Let's imagine we are trying to have a better picture of the sexual orientation of individuals in management positions. If we conduct a survey, then questions related to gender, marital status and sex are pertinent, even if intrusive. If we were to ask questions about salary, age and nationality, we would have to justify why these variables are necessary.
To avoid any breach of personal information, strict IT and Information Management measures must be taken during all stages of working with data - the collection, retention, use, disclosure and disposal of information, in order to protect the confidentiality of this vulnerable population as well as the integrity of the project.
Statistical activities undertaken for the benefit of society have the responsibility to be transparent about where the data come from, how they are used and the steps that are taken to ensure confidentiality.
At Statistics Canada's Trust Centre for example, you will find a list of all current surveys and statistical programs, together with their methodologies, goals and data sources. Making these projects available is important not only so that Canadians can consult how statistical activities are conducted to determine if a project is in their best interest, but also so they can keep the agency accountable and point out whenever Statistics Canada ever encroaches upon the limits of its mandate.
The Data Quality principle means that the data used to create statistical information must be as representative and accurate as possible. Maintaining this expectation means ensuring that biases and errors do not compromise the potential benefits of a project or mislead data users.
When conducting a survey, low response rates can lead to biased estimates or samples too small to meet the information need. Take data surrounding employment among individuals with disabilities for example. If the response rate for survey affects the quality of the estimates, Statistics Canada might decide to start using alternative data sources, such as administrative data acquired from industrial associations or labor unions.
If these new sources are biased, the unreliable information resulting from them may lead to uninformed measures and policies, which may cause more harm than good.
When conducting statistical activities, it is necessary to consider all the potential risks that a statistical activity may pose to the well-being of individuals or specific groups.
When acquiring and linking a large amount of data, detailed descriptions of smaller sub-populations of society might become available for analysis. These detailed clusters can sometimes magnify what is happening at the lowest level of geography. While this may sound harmless, it is important to remember these clusters of data might reveal information such as ethnicity and socio-economic status. Putting any sub-population under a microscope can raise ethical issues. For instance, studies on criminality have to be worded in careful manner so as to not reinforce stereotypes, and results have to be shared with caution to ensure that the information is informative and not taken as an indictment of a specific population group.
In order to maintain the trust of the public, the use of data for the benefit of society should occur only by implementing such best practises as assuring confidentiality, protecting personal information, producing representative data, and being accountable. By making this our mandate, we can ensure that our statistical activities remain socially acceptable in the eyes of the public. If we have social acceptability, any partnership and any approach we undertake becomes and opportunity to show that we follow our mandate and helps the agency promote its objectives and maintain the trust of the public in the long term.
To illustrate when trust really matters, imagine we are trying to gather information on recreational cannabis use by Canadian youth, via voluntary crowdsourcing, and that this is happening before cannabis was legalized. One can only expect respondents to provide accurate, reliable data if they trust the institution responsible for guarding their responses and preserving confidentiality. In this case, they must trust their data is not going to be shared with anyone, including peers, parents and even legal authorities.
In summary, Data Ethics is the field of study that addresses questions about the appropriate use of data.
With advances in data gathering techniques comes ethical challenges regarding access to and use of data.
Gathering, exploring,
analyzing and interpreting data
are essential steps in producing
information that benefits society,
the economy and the environment.
To properly conduct these processes,
data ethics must be upheld
in order to ensure the
appropriate use of data.
In this video, you will be introduced to
data ethics, why they are important,
and the six guiding principles of
data ethics implemented by Statistics
Canada throughout the data journey.
This diagram is a visual representation
of the data journey from collecting
the data; to exploring, cleaning,
describing and understanding
the data; to analyzing the data;
and lastly to communicating with
others the story the data tell.
Data ethics are relevant throughout
all steps of the data journey.
So what are data ethics exactly?
Well, data ethics allow data users
to address questions about the
appropriate use of data throughout
all steps of the data journey.
This field of study is used to
ensure collected data always
have a specific purpose,
and that each new project or data
acquisition has the best interests of
both society and the individual at heart.
With the rapid growth of data
associated with the digital age,
data gathering approaches have also evolved.
Along with the more traditional survey-
based approach, some alternative
data gathering methods include:
earth observation data, scanner data,
Administrative data and web scraping.
These data are then used to create
useful information such as statistics,
and train algorithms for artificial
intelligence and machine learning.
But with more data comes more responsibility...
When deciding to embrace such evolving data
gathering methods as administrative sourcing,
web scraping apps and crowdsourcing,
there is a responsibility to maintain focus
on such perennial ethical challenges as:
protecting privacy and confidentiality,
balancing privacy intrusion
versus public good,
recognizing the potentially
harmful impacts of using bias data,
and ensuring data quality
to avoid misinformation.
There are many ways to address
these ethical challenges,
at Statistics Canada,
we use the following 6 guiding principles:
Data are used to benefit Canadians.
Data are used in a secure and private manner.
Data acquisitions and processing
methods are transparent and accountable.
Data acquisitions and processing methods
are trustworthy and sustainable.
The data themselves are of high quality.
And any information resulting from the
data are reported fairly and do no harm.
Let's look at these principles
in more detail.
Benefits to society means that statistical
activities must allow governments,
businesses and communities to make informed
decisions and manage resources effectively,
ultimately aiming to clearly
benefit the lives of Canadians.
A census of population is fundamental to
any country's statistical infrastructure.
In Canada, the Census is currently
the only data source that provides
high-quality population and dwelling
counts based on common standards
and at low levels of geography,
as well as consistent and comparable
information on various population groups.
When statistical activities
require personal information,
the consideration of both privacy
and security is mandatory.
The appropriate measures must
always be taken in order to protect
personal information while still
ensuring the data can be used to
create meaningful information.
First, there's a fine balance between
respecting privacy and producing information.
Projects that intrude into the private
lives of Canadians must justify why
this information is important enough
to warrant this intrusion, and be
able to explain how using this data
will ultimately provide benefits.
In other words,
we must ensure that our statistical
activities are not intruding into
the lives of Canadians any more
than absolutely necessary,
and to always justify whatever
intrusion we may consider necessary.
Furthermore,
when designing a data-gathering approach,
we have a moral obligation to protect the
confidentiality and data of Canadians.
Part of the data ethics exercise also
consists in ensuring that projects
have considered potential security
threats and have prepared accordingly.
Let's imagine we're trying to have a
better picture of the sexual orientation
of individuals in management positions.
If we conduct a survey,
then questions related to gender,
marital status and sex are pertinent,
even if intrusive.
If we were to ask questions about salary,
age, and nationality, we would have to justify
why these variables are necessary.
To avoid any breach of
such personal information,
strict IT and Information Management
measures must be taken during
all stages of working with data -
such as the collection, retention, use,
disclosure, and disposal of information,
in order to protect the confidentiality
of such a vulnerable population as
well as the integrity of the project.
Statistical activities undertaken
for the benefit of society have the
responsibility to be transparent
about where the data come from,
how they are used and the steps that
are taken to ensure confidentiality.
At Statistics
Canada's Trust Center for example,
you'll find a list of all current
surveys and statistical programs,
together with their methodologies,
goals and data sources.
Making these projects available is important
not only so that Canadians can
consult how statistical activities
are conducted to determine if a
project is in their best interest,
but also so they can keep
the agency accountable.
The Data Quality principle means
that the data used to create
statistical information must be as
representative and accurate as possible.
Maintaining this expectation means
ensuring that biases and errors do
not compromise the potential benefits
of a project or mislead data users.
When conducting a survey,
low response rates can lead to
biased estimates or samples too
small to meet the information needed.
Take data surrounding employment
among individuals with disabilities
for example.
If the response rate for the survey
affects the quality of the estimates,
Statistics Canada might decide
to start using alternative data
sources, such as administrative
data acquired from industrial
associations or labor unions.
If these new sources are biased,
the unreliable information resulting
from them may lead to uninformed
measures and policies, which
may cause more harm than good.
When conducting statistical activities,
it is necessary to consider all the
potential risks that a statistical
activity may pose to the well-being
of individuals or specific groups.
When acquiring and linking
a large amount of data,
detailed descriptions of smaller
sub-populations of society might
become available for analysis.
These detailed clusters can
sometimes magnify what is happening
at the lowest level of geography.
While this may sound harmless,
it is important to remember these clusters
of data might reveal information such
as ethnicity and socio-economic status.
Putting any sub-population under a
microscope which can raise ethical issues.
For instance,
studies on criminality have to
be worded very carefully so as
not to reinforce stereotypes,
and results have to be shared
with caution to ensure that the
information is informative and
not taken as an indictment of
a specific population group.
In order to maintain the trust of the public,
the use of data for the benefit
of society should only occur by
implementing best practices such
as assuring confidentiality,
protecting personal information,
producing representative data,
and being accountable.
By making this our mandate,
we can ensure that our statistical
activities remain socially
acceptable in the eyes of the public.
If we have social acceptability,
any partnership in any approach we
undertake becomes an opportunity
to show that we follow our mandate
and help the agency promote its
objectives and maintain the trust
of the people for the long term.
To illustrate when trust really matters,
imagine we're trying to gather
information on recreational cannabis
use by Canadian youth, via voluntary
crowdsourcing, and that this is
happening before cannabis was legalized.
One can only expect respondents
to provide accurate,
reliable data if they trust the
institution responsible for guarding their
responses and preserving confidentiality.
In this case, they must trust their data
is not going to be shared with anyone,
including peers,
parents or even legal authorities.
In summary, Data Ethics is the field
of study that addresses questions
about the appropriate use of data.
With advances in data gathering
techniques comes ethical concerns
regarding access to and use of data.
There are six guiding principles you
can use to address ethical concerns:
benefits to Canadians,
privacy and security,
transparency and accountability,
trust and sustainability,
data quality,
fairness and do no harm.