Statistics 101: Confidence intervals - Transcript
Statistics 101: Confidence intervals - Transcript
(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Statistics 101 Confidence intervals".)
Statistics 101: Confidence intervals
Have you heard this before…
(Text on screen: 37% of Canadians anticipate working from home for the foreseeable future, based on an online survey of 2,000 Canadian adults, with a margin of error of +/- 2.0 percentage points, 19 times out of 20. Do you know what "a margin of error of +/- 2.0 percentage points, 19 times out of 20" means? This is an example of a confidence interval.)
You have probably heard on the radio or television or read in the newspaper a statement like this:
37% of Canadians anticipate working from home for the foreseeable future, based on an online survey of 2,000 Canadian adults, with a margin of error of +/- 2.0 percentage points, 19 times out of 20.
But what exactly does it mean and why is the information presented in this way?
Working with statistics involves an element of uncertainty, and in this video we will see how confidence intervals and their underlying concepts help us understand and measure this uncertainty.
The statement above actually presents an example of a confidence interval, even though at first glance it does not look like an interval. The interval in this case is 37% +/- 2.0% - in other words, the interval goes from 35% to 39%.
At the end of this presentation you will be able to read similar statements and understand that they represent confidence intervals. You will also understand what a "margin of error" is, and what is meant by the phrase "19 times out of 20".
As pre-requisite viewing for this video, make sure you've watched our other Statistics 101 videos called "Exploring measures of central tendency" and "Exploring measures of dispersion".
Learning goals
(Text on screen: In this video, you will learn the answers to the following questions: What are confidence intervals? Why do we use confidence intervals? What factors have an impact on a confidence interval?)
By the end of this video you will understand what confidence intervals are, why we use them, and what factors have an impact on them.
Understanding the measures of central tendency and the measures of dispersion before watching this video will help you to understand confidence intervals.
Steps of a data journey
(Text on screen: Supported by a foundation of stewardship, metadata, standards and quality.)
(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)
This diagram is a visual representation of the data journey from collecting the data; to exploring, cleaning, describing and understanding the data; to analyzing the data; and lastly to communicating with others the story the data tell.
Step 2: Explore, clean, and describe; Step 3: Analyze and model; and Step 4: Tell the story
(Diagram of the Steps of the data journey with an emphasis on Step 2: Explore, clean, and describe; Step 3: Analyze and model; and Step 4: Tell the story.)
Confidence intervals are helpful in steps 2, 3 and 4 of the data journey.
What is a Confidence Interval?
(text on screen:
Presents a range of possible values, rather than a single estimated value.
Represents the uncertainty resulting from the use of a sample.
The width of the confidence interval is related to the level of uncertainty.)
(Figure 1 demonstrating an example of confidence interval: the average grade on a math test in a class of 100 students. The estimated value is 70%, the lower bound is at 60% and the upper bound is at 80%. The values included between the lower and the upper bounds represent the confidence interval.)
A confidence interval is a range of possible values for something that we want to estimate – for example, what is the average grade on a math test in a particular class of 100 students. It is typically based on a sample that is representative of the population; however the sample is often small compared to the population. In the example here we have math grades for a sample of 10 students from a class of 100 students.
Since the estimate is based on a sample, there remains some uncertainty about the true value. The confidence interval accounts for this uncertainty by including a range of values, and not just the estimate itself. The more uncertainty there is, the wider the confidence interval will be.
Why do we use confidence intervals?
(Figure 1 demonstrating a young man wondering why we use confidence intervals.)
In statistics, we often estimate a value for a total population using a sample.
The value derived from the sample is not the true value, but an estimate of it.
Confidence intervals example
(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 70%, a lower bound at 60%, an upper bound at 80% and a true value of 73%.)
In this example we have a class of 100 students, each with a percentage grade for a math test.
The class average for the math test is 73%. However, we are not looking at the marks of everyone in the population, but only those of a sample of 10 people. Taking a random sample we obtain an estimated average grade of 70%, with a confidence interval of + or – 10%. In this example, our estimate of 70% is different from the true average of 73%, but the true average is within the confidence interval.
Confidence intervals example
(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 65%, a lower bound at 55%, an upper bound at 75% and a true value of 73%.)
By taking another random sample, we obtain a different estimated average grade of 65%, which is again not equal to the true average of 73%, but the confidence interval of 55% to 75% still contains the true average.
Confidence intervals example
(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 78%, a lower bound at 68%, an upper bound at 88% and a true value of 73%.)
A third sample of the same class obtains an estimated average grade of 78%. This estimate again differs from the true average of 73%, but again the confidence interval contains the true average.
Estimated Value
(Figure demonstrating a confidence interval, with the estimated value highlighted in the centre.)
The estimated value from the sample is usually at the centre of the confidence interval.
Estimated Value
(Figure demonstrating a confidence interval, highlighting the lower and upper bounds of the interval at equal distance from the estimated value.)
The upper and lower bounds of the confidence interval are then an equal distance above and below the estimated value.
Estimated Value
(Figure demonstrating a confidence interval, highlighting the margin of error below and above the estimated value.)
The distance from the estimated value to the upper or lower bound is called the margin of error.
The size of the margin of error reflects the uncertainty about the true value. More uncertainty means a larger margin of error.
Factors having an impact on a confidence interval
(Figure demonstrating different coloured people with question marks on their heads.)
There are three factors that determine the width of the confidence interval from a sample survey – the confidence level, the variability within the population, and the size of the sample.
These factors will now be described one by one.
Confidence level
(Figure demonstrating an estimated value and two confidence intervals, a first one with a 95% confidence level and a second one, with a 99% confidence level.)
The confidence level tells us how certain we are that the interval contains the true population value.
With a 95% confidence level, we are 95% confident that the confidence interval contains the true value. In other words, if we were to repeat the survey many times, the interval would contain the true value 19 times out of 20.
With a 99% confidence level, we are 99% confident that the confidence interval contains the true value. Note that the higher level of confidence requires a longer confidence interval.
Variability within the population
(Figure demonstrating grades on math test for two different groups, a Regular Math class and an Enriched Math class.)
By variability of a population we mean how different population members are, one from another.
In the example shown here the grades of students in the Enriched Math class are less variable than the grades of students in the Regular Math class. In the Regular Math Class, grades vary from 54% to 87%. In the Enriched Math class, grades vary from 86% to 96% – about one third the variability of the Regular Math class.
If variability is high in the population, then it will be high in the sample. If we had two different random samples from the population, then the difference between the two different estimates would also tend to be larger. So higher variability in the population leads to higher variability in the samples, which leads to higher variability in the estimates. This larger variability for the estimates is reflected in a larger margin of error, so that the confidence interval is wider.
Similarly, if variability is lower in the population, then it will be lower in the sample, and the estimate will have lower variability, leading to a smaller margin of error and a narrower confidence interval.
Size of the sample
(Figure demonstrating a class of 100 students.)
A larger sample will produce more precise estimates – that is, estimates with lower variability.
For example, in a class of 100 students, the average of a sample of size 20 would have smaller variability than the average of a sample of size 10. The average of a sample of size 50 would have still smaller variability.
So the larger the sample size, the smaller the variability of the estimate, the smaller the margin of error, and the shorter the confidence interval.
Let's look at an example…
Example - sample of size 10
(Figure demonstrating a class of 100 students, and a sample of 10 students, with an estimated average grade of 64%, and the true class average of 73%.)
The average class grade is 73%.
The average for the random sample of 10 students is 64%.
Example - sample of size 50
(Figure demonstrating a class of 100 students, and a sample of 50 students, with an estimated average grade of 71%, and the true class average of 73%.)
As we see in this example, with a much larger sample size, the variability of the estimator is much smaller, and it would tend to be much closer to the true value. The confidence interval would then be narrower.
Knowledge check
Now it's your turn. How would you interpret the following statement:
According to a recent study, adults living in a specific city weighed an average of 75 kg, with a margin of error of -/+ 10 kg, 9 times out of 10.
What is the estimated value? What is the confidence interval? What is the confidence level?
Take a moment to think about all the information included in this sentence.
Answer
First, we can conclude that the estimated value was obtained using a sample of the population. Second, we understand that the estimated average weight is 75 kg, and that the confidence interval ranges from 65 kg to 85 kg. The confidence interval is quite large, which may suggest a small sample size, high variability in the weight of individuals, or even both.
The confidence level is 90%, or 9 times out of 10. This means that if a random sampling were to be repeated many times, the confidence interval would contain the true value 9 times out of 10. A higher confidence level, 95%, as an example, would require an even wider confidence interval.
Recap of key points
To summarize what we learned today: confidence intervals can help understand and measure the uncertainty associated with estimated values from samples; data coming from samples do not provide true values, but estimated values; the length of the confidence interval can vary based on the size of the sample, the variability of the population and the confidence level required.
(The Canada Wordmark appears.)
You probably heard on
the radio or television
or read in the newspaper
a statement like this:
37% of Canadians anticipate working
from home for the foreseeable future,
based on an online survey of 2000
Canadian adults, with a margin
of error of plus or minus two
percentage points, 19 times out of 20.
But what exactly does it mean
and why is the information
presented in this way?
Working with statistics involves
an element of uncertainty,
and in this video we will see how
confidence intervals and their
underlying concepts help us understand
and measure this uncertainty.
The statement above actually presents
an example of a confidence interval,
even though at first glance it
does not look like an interval.
The interval in this case is
37% plus or minus 2%
In other words,
the interval goes from 35% to 39%.
By the end of this presentation,
you will be able to read similar
statements and understand that they
represent confidence intervals.
You will also understand what a
"margin of error" is, and what is meant
by the phrase "19 times out of 20".
As pre-requisite viewing for this video,
make sure you've watched our other
Statistics 101 videos called "Exploring
measures of central tendency" and
"Exploring measures of dispersion".
In this video you will learn the
answers to the following questions:
What are confidence intervals?
Why do we use confidence intervals?
and what factors have an impact
on a confidence interval?
This diagram is a visual representation
of the data journey from collecting
the data to exploring, cleaning,
describing and understanding the data.
To analyzing the data and
lastly to communicating with
others the story the data tell.
Confidence intervals are helpful in
steps 2, 3 and 4 of the data journey.
A confidence interval is a range of
possible values for something that
we want to estimate, for example,
what is the average grade on a math test
in a particular class of 100 students.
It is typically based on a sample
that is supposed to be representative
of the population.
However, the sample is often
small compared to the population.
In the example here,
we have math grades for a sample of 10
students from a class of 100 students.
Since the estimate is based on a sample,
there remains some uncertainty
about the true value.
The confidence interval accounts for
this uncertainty by including a range of
values, and not just the estimate itself.
The more uncertainty there is,
the wider the confidence interval will be.
In statistics, we often estimate a value
for a total population using a sample.
The value derived from the sample
is not the true value,
but an estimate of it.
In this example,
we have a class of 100 students,
each with a percentage grade
for a math test.
The class average for the math test is 73%.
However,
we are not looking at the marks
of everyone in the population,
but only those of a sample of 10 people.
Taking a random sample shaded
in red in the figure,
we obtain an estimated average of 70% with
a confidence interval of plus or minus 10%.
In this example
our estimate of 70% is different
from the true average of 73%,
but the true average is within
the confidence interval.
By taking another random sample,
we obtain a different estimated
grade average of 65%,
which is again not equal
to the true average of 73%,
but the confidence interval of 55% to
75% still contains the true average.
A third sample of the same class obtains
an estimated average grade of 78%.
This estimate again differs
from the true average of 73%,
but again the confidence interval
contains the true average.
The estimated value of the
sample is usually at the center
of the confidence interval.
The upper and lower bounds of
the confidence interval are
then equally distanced above
and below the estimated value.
The distance from the estimated
value to the upper or lower bound
is called the margin of error.
The size of the margin of error reflects
the uncertainty about the true value.
More uncertainty means a
larger margin of error.
The location of a confidence interval
is determined by the estimated value,
which is normally the central
value of the interval.
There are three factors that
determine the width of the confidence
interval from a sample survey -
the confidence level,
the variability in the population,
and the size of the sample.
These factors will now be described one by one.
The confidence level tells us how
certain we are that the interval
contains the true population value.
For a 95% confidence interval,
we are 95% confident that the
interval contains the true
value. In other words,
if we were to repeat the survey many times,
the interval would contain the
true value 19 times out of 20.
For a 99% confidence interval,
we are 99% confident that the
interval contains the true value.
Note that the higher level of confidence
requires a longer confidence interval.
A longer confidence interval generally
has a higher confidence level.
A shorter confidence interval generally
has a lower confidence level.
By variability of a population
we mean how different population
members are, from one another.
In the example shown here
the grades of students in the Enriched
Math class are less variable than the
grades of students in the Regular Math class.
In the Regular Math class,
grades vary from 54% to 87%.
In the Enriched Math class,
grades vary from 86% to 96% -
about 1/3 of the variability
of the Regular Math class.
If variability is high in the population,
then it will be high in the sample.
If we had two different random
samples from the population,
then the difference between the
two different estimates would
also tend to be larger.
So higher variability in the population
leads to higher variability in the samples,
which leads to higher variability
in the estimates.
This large variability for the
estimates is reflected in a
larger margin of error, so that
the confidence interval is wider.
Similarly,
if variability is lower in the population,
then it will be lower in the sample,
and the estimate will
have a lower variability,
leading to a smaller margin of error
and a narrower confidence interval.
A larger sample will produce
more precise estimates -
that is, estimates with lower variability.
For example,
in a class of 100 students,
the average grades of a sample
size of 20 would have smaller
variability than a sample size of 10.
The average grades of a sample size of 50
would have smaller variability even still.
So the larger the sample size,
the smaller the variability of the estimate,
the smaller the margin of error, and
the shorter the confidence interval.
Let's look at an example.
In this example,
we take the same class of 100 students.
The average class grade is 73%.
If we select a sample of 10 students,
the average for those ten students is 64%.
Now when we look at a sample size of
50 for the same class of 100 students,
a much larger sample size,
the variability of the estimate is
much smaller, and it would tend to
be much closer to the true value.
So the confidence interval
would then be narrower.
Now it's your turn.
How would you interpret the
following statement:
According to a recent study,
adults living in a specific city weight
on average 75 kg, with a margin
of error of plus or minus 10 kg,
9 times out of 10.
Take a second to think for yourself.
What's the estimated value?
What's the confidence interval?
What's the confidence level?
Pause the video if you have to and
hit play when you're ready to answer.
First, we can conclude that the
estimated value was obtained
using a sample of the population.
Second, we understand that the estimated
average weight is 75 kg,
and that the confidence interval ranges
from 65 kg to 85 kg.
The confidence levels quite large,
which may suggest a small sample size,
high variability in the weight
of the individuals, or both.
The confidence level is 90%,
or 9 times out of 10.
This means that if a random sampling
was to be repeated many times,
the confidence interval would contain
the true value 9 times out of 10.
A higher confidence level,
95%, for example,
would require an even
wider confidence interval.
Let's summarize what we learned today.
Confidence intervals represent
the uncertainty resulting
from the use of a sample.
Data coming from samples do not provide
true values, but estimated values.
And finally, the length of the
confidence interval depends
on the size of the sample,
the variability of the population,
and the confidence level.