Statistics 101: Confidence intervals

Catalogue number: 892000062022003

Release date: May 24, 2022 Updated: January 25, 2023

In this video, you will learn the answers to the following questions:

What are confidence intervals?
Why do we use confidence intervals?
What factors have an impact on a confidence interval?

Data journey step

Foundation

Data competency

Data analysis
Data interpretation

Audience

Basic

Suggested prerequisites

Length

10:54

Cost

Free

Watch the video

Statistics 101: Confidence intervals - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Statistics 101 Confidence intervals".)

Statistics 101: Confidence intervals

Have you heard this before…

(Text on screen: 37% of Canadians anticipate working from home for the foreseeable future, based on an online survey of 2,000 Canadian adults, with a margin of error of +/- 2.0 percentage points, 19 times out of 20. Do you know what "a margin of error of +/- 2.0 percentage points, 19 times out of 20" means? This is an example of a confidence interval.)

You have probably heard on the radio or television or read in the newspaper a statement like this:

37% of Canadians anticipate working from home for the foreseeable future, based on an online survey of 2,000 Canadian adults, with a margin of error of +/- 2.0 percentage points, 19 times out of 20.

But what exactly does it mean and why is the information presented in this way?

Working with statistics involves an element of uncertainty, and in this video we will see how confidence intervals and their underlying concepts help us understand and measure this uncertainty.

The statement above actually presents an example of a confidence interval, even though at first glance it does not look like an interval. The interval in this case is 37% +/- 2.0% - in other words, the interval goes from 35% to 39%.

At the end of this presentation you will be able to read similar statements and understand that they represent confidence intervals. You will also understand what a "margin of error" is, and what is meant by the phrase "19 times out of 20".

As pre-requisite viewing for this video, make sure you've watched our other Statistics 101 videos called "Exploring measures of central tendency" and "Exploring measures of dispersion".

Learning goals

(Text on screen: In this video, you will learn the answers to the following questions: What are confidence intervals? Why do we use confidence intervals? What factors have an impact on a confidence interval?)

By the end of this video you will understand what confidence intervals are, why we use them, and what factors have an impact on them.

Understanding the measures of central tendency and the measures of dispersion before watching this video will help you to understand confidence intervals.

Steps of a data journey

(Text on screen: Supported by a foundation of stewardship, metadata, standards and quality.)

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

This diagram is a visual representation of the data journey from collecting the data; to exploring, cleaning, describing and understanding the data; to analyzing the data; and lastly to communicating with others the story the data tell.

Step 2: Explore, clean, and describe; Step 3: Analyze and model; and Step 4: Tell the story

(Diagram of the Steps of the data journey with an emphasis on Step 2: Explore, clean, and describe; Step 3: Analyze and model; and Step 4: Tell the story.)

Confidence intervals are helpful in steps 2, 3 and 4 of the data journey.

What is a Confidence Interval?

(text on screen:

Presents a range of possible values, rather than a single estimated value.

Represents the uncertainty resulting from the use of a sample.

The width of the confidence interval is related to the level of uncertainty.)

(Figure 1 demonstrating an example of confidence interval: the average grade on a math test in a class of 100 students. The estimated value is 70%, the lower bound is at 60% and the upper bound is at 80%. The values included between the lower and the upper bounds represent the confidence interval.)

A confidence interval is a range of possible values for something that we want to estimate – for example, what is the average grade on a math test in a particular class of 100 students. It is typically based on a sample that is representative of the population; however the sample is often small compared to the population. In the example here we have math grades for a sample of 10 students from a class of 100 students.

Since the estimate is based on a sample, there remains some uncertainty about the true value. The confidence interval accounts for this uncertainty by including a range of values, and not just the estimate itself. The more uncertainty there is, the wider the confidence interval will be.

Why do we use confidence intervals?

(Figure 1 demonstrating a young man wondering why we use confidence intervals.)

In statistics, we often estimate a value for a total population using a sample.

The value derived from the sample is not the true value, but an estimate of it.

Confidence intervals example

(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 70%, a lower bound at 60%, an upper bound at 80% and a true value of 73%.)

In this example we have a class of 100 students, each with a percentage grade for a math test.

The class average for the math test is 73%. However, we are not looking at the marks of everyone in the population, but only those of a sample of 10 people. Taking a random sample we obtain an estimated average grade of 70%, with a confidence interval of + or – 10%. In this example, our estimate of 70% is different from the true average of 73%, but the true average is within the confidence interval.

Confidence intervals example

(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 65%, a lower bound at 55%, an upper bound at 75% and a true value of 73%.)

By taking another random sample, we obtain a different estimated average grade of 65%, which is again not equal to the true average of 73%, but the confidence interval of 55% to 75% still contains the true average.

Confidence intervals example

(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 78%, a lower bound at 68%, an upper bound at 88% and a true value of 73%.)

A third sample of the same class obtains an estimated average grade of 78%. This estimate again differs from the true average of 73%, but again the confidence interval contains the true average.

Estimated Value

(Figure demonstrating a confidence interval, with the estimated value highlighted in the centre.)

The estimated value from the sample is usually at the centre of the confidence interval.

Estimated Value

(Figure demonstrating a confidence interval, highlighting the lower and upper bounds of the interval at equal distance from the estimated value.)

The upper and lower bounds of the confidence interval are then an equal distance above and below the estimated value.

Estimated Value

(Figure demonstrating a confidence interval, highlighting the margin of error below and above the estimated value.)

The distance from the estimated value to the upper or lower bound is called the margin of error.

The size of the margin of error reflects the uncertainty about the true value. More uncertainty means a larger margin of error.

Factors having an impact on a confidence interval

(Figure demonstrating different coloured people with question marks on their heads.)

There are three factors that determine the width of the confidence interval from a sample survey – the confidence level, the variability within the population, and the size of the sample.

These factors will now be described one by one.

Confidence level

(Figure demonstrating an estimated value and two confidence intervals, a first one with a 95% confidence level and a second one, with a 99% confidence level.)

The confidence level tells us how certain we are that the interval contains the true population value.

With a 95% confidence level, we are 95% confident that the confidence interval contains the true value. In other words, if we were to repeat the survey many times, the interval would contain the true value 19 times out of 20.

With a 99% confidence level, we are 99% confident that the confidence interval contains the true value. Note that the higher level of confidence requires a longer confidence interval.

Variability within the population

(Figure demonstrating grades on math test for two different groups, a Regular Math class and an Enriched Math class.)

By variability of a population we mean how different population members are, one from another.

In the example shown here the grades of students in the Enriched Math class are less variable than the grades of students in the Regular Math class. In the Regular Math Class, grades vary from 54% to 87%. In the Enriched Math class, grades vary from 86% to 96% – about one third the variability of the Regular Math class.

If variability is high in the population, then it will be high in the sample. If we had two different random samples from the population, then the difference between the two different estimates would also tend to be larger. So higher variability in the population leads to higher variability in the samples, which leads to higher variability in the estimates. This larger variability for the estimates is reflected in a larger margin of error, so that the confidence interval is wider.

Similarly, if variability is lower in the population, then it will be lower in the sample, and the estimate will have lower variability, leading to a smaller margin of error and a narrower confidence interval.

Size of the sample

(Figure demonstrating a class of 100 students.)

A larger sample will produce more precise estimates – that is, estimates with lower variability.

For example, in a class of 100 students, the average of a sample of size 20 would have smaller variability than the average of a sample of size 10. The average of a sample of size 50 would have still smaller variability.

So the larger the sample size, the smaller the variability of the estimate, the smaller the margin of error, and the shorter the confidence interval.

Let's look at an example…

Example - sample of size 10

(Figure demonstrating a class of 100 students, and a sample of 10 students, with an estimated average grade of 64%, and the true class average of 73%.)

The average class grade is 73%.

The average for the random sample of 10 students is 64%.

Example - sample of size 50

(Figure demonstrating a class of 100 students, and a sample of 50 students, with an estimated average grade of 71%, and the true class average of 73%.)

As we see in this example, with a much larger sample size, the variability of the estimator is much smaller, and it would tend to be much closer to the true value. The confidence interval would then be narrower.

Knowledge check

Now it's your turn. How would you interpret the following statement:

According to a recent study, adults living in a specific city weighed an average of 75 kg, with a margin of error of -/+ 10 kg, 9 times out of 10.

What is the estimated value? What is the confidence interval? What is the confidence level?

Take a moment to think about all the information included in this sentence.

Answer

First, we can conclude that the estimated value was obtained using a sample of the population. Second, we understand that the estimated average weight is 75 kg, and that the confidence interval ranges from 65 kg to 85 kg. The confidence interval is quite large, which may suggest a small sample size, high variability in the weight of individuals, or even both.

The confidence level is 90%, or 9 times out of 10. This means that if a random sampling were to be repeated many times, the confidence interval would contain the true value 9 times out of 10. A higher confidence level, 95%, as an example, would require an even wider confidence interval.

Recap of key points

To summarize what we learned today: confidence intervals can help understand and measure the uncertainty associated with estimated values from samples; data coming from samples do not provide true values, but estimated values; the length of the confidence interval can vary based on the size of the sample, the variability of the population and the confidence level required.

(The Canada Wordmark appears.)

You probably heard on

the radio or television

or read in the newspaper

a statement like this:

37% of Canadians anticipate working

from home for the foreseeable future,

based on an online survey of 2000

Canadian adults, with a margin

of error of plus or minus two

percentage points, 19 times out of 20.

But what exactly does it mean

and why is the information

presented in this way?

Working with statistics involves

an element of uncertainty,

and in this video we will see how

confidence intervals and their

underlying concepts help us understand

and measure this uncertainty.

The statement above actually presents

an example of a confidence interval,

even though at first glance it

does not look like an interval.

The interval in this case is

37% plus or minus 2%

In other words,

the interval goes from 35% to 39%.

By the end of this presentation,

you will be able to read similar

statements and understand that they

represent confidence intervals.

You will also understand what a

"margin of error" is, and what is meant

by the phrase "19 times out of 20".

As pre-requisite viewing for this video,

make sure you've watched our other

Statistics 101 videos called "Exploring

measures of central tendency" and

"Exploring measures of dispersion".

In this video you will learn the

answers to the following questions:

What are confidence intervals?

Why do we use confidence intervals?

and what factors have an impact

on a confidence interval?

This diagram is a visual representation

of the data journey from collecting

the data to exploring, cleaning,

describing and understanding the data.

To analyzing the data and

lastly to communicating with

others the story the data tell.

Confidence intervals are helpful in

steps 2, 3 and 4 of the data journey.

A confidence interval is a range of

possible values for something that

we want to estimate, for example,

what is the average grade on a math test

in a particular class of 100 students.

It is typically based on a sample

that is supposed to be representative

of the population.

However, the sample is often

small compared to the population.

In the example here,

we have math grades for a sample of 10

students from a class of 100 students.

Since the estimate is based on a sample,

there remains some uncertainty

about the true value.

The confidence interval accounts for

this uncertainty by including a range of

values, and not just the estimate itself.

The more uncertainty there is,

the wider the confidence interval will be.

In statistics, we often estimate a value

for a total population using a sample.

The value derived from the sample

is not the true value,

but an estimate of it.

In this example,

we have a class of 100 students,

each with a percentage grade

for a math test.

The class average for the math test is 73%.

However,

we are not looking at the marks

of everyone in the population,

but only those of a sample of 10 people.

Taking a random sample shaded

in red in the figure,

we obtain an estimated average of 70% with

a confidence interval of plus or minus 10%.

In this example

our estimate of 70% is different

from the true average of 73%,

but the true average is within

the confidence interval.

By taking another random sample,

we obtain a different estimated

grade average of 65%,

which is again not equal

to the true average of 73%,

but the confidence interval of 55% to

75% still contains the true average.

A third sample of the same class obtains

an estimated average grade of 78%.

This estimate again differs

from the true average of 73%,

but again the confidence interval

contains the true average.

The estimated value of the

sample is usually at the center

of the confidence interval.

The upper and lower bounds of

the confidence interval are

then equally distanced above

and below the estimated value.

The distance from the estimated

value to the upper or lower bound

is called the margin of error.

The size of the margin of error reflects

the uncertainty about the true value.

More uncertainty means a

larger margin of error.

The location of a confidence interval

is determined by the estimated value,

which is normally the central

value of the interval.

There are three factors that

determine the width of the confidence

interval from a sample survey -

the confidence level,

the variability in the population,

and the size of the sample.

These factors will now be described one by one.

The confidence level tells us how

certain we are that the interval

contains the true population value.

For a 95% confidence interval,

we are 95% confident that the

interval contains the true

value. In other words,

if we were to repeat the survey many times,

the interval would contain the

true value 19 times out of 20.

For a 99% confidence interval,

we are 99% confident that the

interval contains the true value.

Note that the higher level of confidence

requires a longer confidence interval.

A longer confidence interval generally

has a higher confidence level.

A shorter confidence interval generally

has a lower confidence level.

By variability of a population

we mean how different population

members are, from one another.

In the example shown here

the grades of students in the Enriched

Math class are less variable than the

grades of students in the Regular Math class.

In the Regular Math class,

grades vary from 54% to 87%.

In the Enriched Math class,

grades vary from 86% to 96% -

about 1/3 of the variability

of the Regular Math class.

If variability is high in the population,

then it will be high in the sample.

If we had two different random

samples from the population,

then the difference between the

two different estimates would

also tend to be larger.

So higher variability in the population

leads to higher variability in the samples,

which leads to higher variability

in the estimates.

This large variability for the

estimates is reflected in a

larger margin of error, so that

the confidence interval is wider.

Similarly,

if variability is lower in the population,

then it will be lower in the sample,

and the estimate will

have a lower variability,

leading to a smaller margin of error

and a narrower confidence interval.

A larger sample will produce

more precise estimates -

that is, estimates with lower variability.

For example,

in a class of 100 students,

the average grades of a sample

size of 20 would have smaller

variability than a sample size of 10.

The average grades of a sample size of 50

would have smaller variability even still.

So the larger the sample size,

the smaller the variability of the estimate,

the smaller the margin of error, and

the shorter the confidence interval.

Let's look at an example.

In this example,

we take the same class of 100 students.

The average class grade is 73%.

If we select a sample of 10 students,

the average for those ten students is 64%.

Now when we look at a sample size of

50 for the same class of 100 students,

a much larger sample size,

the variability of the estimate is

much smaller, and it would tend to

be much closer to the true value.

So the confidence interval

would then be narrower.

Now it's your turn.

How would you interpret the

following statement:

According to a recent study,

adults living in a specific city weight

on average 75 kg, with a margin

of error of plus or minus 10 kg,

9 times out of 10.

Take a second to think for yourself.

What's the estimated value?

What's the confidence interval?

What's the confidence level?

Pause the video if you have to and

hit play when you're ready to answer.

First, we can conclude that the

estimated value was obtained

using a sample of the population.

Second, we understand that the estimated

average weight is 75 kg,

and that the confidence interval ranges

from 65 kg to 85 kg.

The confidence levels quite large,

which may suggest a small sample size,

high variability in the weight

of the individuals, or both.

The confidence level is 90%,

or 9 times out of 10.

This means that if a random sampling

was to be repeated many times,

the confidence interval would contain

the true value 9 times out of 10.

A higher confidence level,

95%, for example,

would require an even

wider confidence interval.

Let's summarize what we learned today.

Confidence intervals represent

the uncertainty resulting

from the use of a sample.

Data coming from samples do not provide

true values, but estimated values.

And finally, the length of the

confidence interval depends

on the size of the sample,

the variability of the population,

and the confidence level.

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.

Language selection

WxT Language switcher

Search and menus

WxT Search form

Statistics 101: Confidence intervals

Watch the video

Statistics 101: Confidence intervals - Transcript

What did you think?