Statistics 101: Confidence intervals

Catalogue number: 892000062022003

Release date: May 24, 2022 Updated: January 25, 2023

In this video, you will learn the answers to the following questions:

  • What are confidence intervals?
  • Why do we use confidence intervals?
  • What factors have an impact on a confidence interval?
Data journey step
Foundation
Data competency
  • Data analysis
  • Data interpretation
Audience
Basic
Suggested prerequisites
Length
10:54
Cost
Free

Watch the video

Statistics 101: Confidence intervals - Transcript

Statistics 101: Confidence intervals - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Statistics 101 Confidence intervals".)

Statistics 101: Confidence intervals

Have you heard this before…

(Text on screen: 37% of Canadians anticipate working from home for the foreseeable future, based on an online survey of 2,000 Canadian adults, with a margin of error of +/- 2.0 percentage points, 19 times out of 20. Do you know what "a margin of error of +/- 2.0 percentage points, 19 times out of 20" means?  This is an example of a confidence interval.)

You have probably heard on the radio or television or read in the newspaper a statement like this:

37% of Canadians anticipate working from home for the foreseeable future, based on an online survey of 2,000 Canadian adults, with a margin of error of +/- 2.0 percentage points, 19 times out of 20.

But what exactly does it mean and why is the information presented in this way?

Working with statistics involves an element of uncertainty, and in this video we will see how confidence intervals and their underlying concepts help us understand and measure this uncertainty.

The statement above actually presents an example of a confidence interval, even though at first glance it does not look like an interval. The interval in this case is 37% +/- 2.0% - in other words, the interval goes from 35% to 39%.

At the end of this presentation you will be able to read similar statements and understand that they represent confidence intervals. You will also understand what a "margin of error" is, and what is meant by the phrase "19 times out of 20".

As pre-requisite viewing for this video, make sure you've watched our other Statistics 101 videos called "Exploring measures of central tendency" and "Exploring measures of dispersion".

Learning goals

(Text on screen: In this video, you will learn the answers to the following questions: What are confidence intervals? Why do we use confidence intervals? What factors have an impact on a confidence interval?)

By the end of this video you will understand what confidence intervals are, why we use them, and what factors have an impact on them.

Understanding the measures of central tendency and the measures of dispersion before watching this video will help you to understand confidence intervals.

Steps of a data journey

(Text on screen: Supported by a foundation of stewardship, metadata, standards and quality.)

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

This diagram is a visual representation of the data journey from collecting the data; to exploring, cleaning, describing and understanding the data; to analyzing the data; and lastly to communicating with others the story the data tell.

Step 2: Explore, clean, and describe; Step 3: Analyze and model; and Step 4: Tell the story

(Diagram of the Steps of the data journey with an emphasis on Step 2: Explore, clean, and describe; Step 3: Analyze and model; and Step 4: Tell the story.)

Confidence intervals are helpful in steps 2, 3 and 4 of the data journey.

What is a Confidence Interval?

(text on screen:

Presents a range of possible values, rather than a single estimated value.

Represents the uncertainty resulting from the use of a sample.

The width of the confidence interval is related to the level of uncertainty.)

(Figure 1 demonstrating an example of confidence interval: the average grade on a math test in a class of 100 students. The estimated value is 70%, the lower bound is at 60% and the upper bound is at 80%. The values included between the lower and the upper bounds represent the confidence interval.)

A confidence interval is a range of possible values for something that we want to estimate – for example, what is the average grade on a math test in a particular class of 100 students. It is typically based on a sample that is representative of the population; however the sample is often small compared to the population. In the example here we have math grades for a sample of 10 students from a class of 100 students.

Since the estimate is based on a sample, there remains some uncertainty about the true value.  The confidence interval accounts for this uncertainty by including a range of values, and not just the estimate itself. The more uncertainty there is, the wider the confidence interval will be.

Why do we use confidence intervals?

(Figure 1 demonstrating a young man wondering why we use confidence intervals.)

In statistics, we often estimate a value for a total population using a sample.

The value derived from the sample is not the true value, but an estimate of it.

Confidence intervals example

(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 70%, a lower bound at 60%, an upper bound at 80% and a true value of 73%.)

In this example we have a class of 100 students, each with a percentage grade for a math test. 

The class average for the math test is 73%. However, we are not looking at the marks of everyone in the population, but only those of a sample of 10 people. Taking a random sample we obtain an estimated average grade of 70%, with a confidence interval of + or – 10%. In this example, our estimate of 70% is different from the true average of 73%, but the true average is within the confidence interval.

Confidence intervals example

(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 65%, a lower bound at 55%, an upper bound at 75% and a true value of 73%.)

By taking another random sample, we obtain a different estimated average grade of 65%, which is again not equal to the true average of 73%, but the confidence interval of 55% to 75% still contains the true average.

Confidence intervals example

(Figure 1 demonstrating a class of 100 students, and a sample of 10 students. Figure 2 demonstrating the confidence interval, with an estimated value of 78%, a lower bound at 68%, an upper bound at 88% and a true value of 73%.)

A third sample of the same class obtains an estimated average grade of 78%. This estimate again differs from the true average of 73%, but again the confidence interval contains the true average.

Estimated Value

(Figure demonstrating a confidence interval, with the estimated value highlighted in the centre.)

The estimated value from the sample is usually at the centre of the confidence interval.

Estimated Value

(Figure demonstrating a confidence interval, highlighting the lower and upper bounds of the interval at equal distance from the estimated value.)

The upper and lower bounds of the confidence interval are then an equal distance above and below the estimated value.

Estimated Value

(Figure demonstrating a confidence interval, highlighting the margin of error below and above the estimated value.)

The distance from the estimated value to the upper or lower bound is called the margin of error.

The size of the margin of error reflects the uncertainty about the true value. More uncertainty means a larger margin of error.

Factors having an impact on a confidence interval

(Figure demonstrating different coloured people with question marks on their heads.)

There are three factors that determine the width of the confidence interval from a sample survey – the confidence level, the variability within the population, and the size of the sample.

These factors will now be described one by one.

Confidence level

(Figure demonstrating an estimated value and two confidence intervals, a first one with a 95% confidence level and a second one, with a 99% confidence level.)

The confidence level tells us how certain we are that the interval contains the true population value. 

With a 95% confidence level, we are 95% confident that the confidence interval contains the true value. In other words, if we were to repeat the survey many times, the interval would contain the true value 19 times out of 20.

With a 99% confidence level, we are 99% confident that the confidence interval contains the true value.  Note that the higher level of confidence requires a longer confidence interval.

Variability within the population

(Figure demonstrating grades on math test for two different groups, a Regular Math class and an Enriched Math class.)

By variability of a population we mean how different population members are, one from another.

In the example shown here the grades of students in the Enriched Math class are less variable than the grades of students in the Regular Math class. In the Regular Math Class, grades vary from 54% to 87%. In the Enriched Math class, grades vary from 86% to 96% – about one third the variability of the Regular Math class.

If variability is high in the population, then it will be high in the sample. If we had two different random samples from the population, then the difference between the two different estimates would also tend to be larger. So higher variability in the population leads to higher variability in the samples, which leads to higher variability in the estimates. This larger variability for the estimates is reflected in a larger margin of error, so that the confidence interval is wider.

Similarly, if variability is lower in the population, then it will be lower in the sample, and the estimate will have lower variability, leading to a smaller margin of error and a narrower confidence interval.

Size of the sample

(Figure demonstrating a class of 100 students.)

A larger sample will produce more precise estimates – that is, estimates with lower variability. 

For example, in a class of 100 students, the average of a sample of size 20 would have smaller variability than the average of a sample of size 10. The average of a sample of size 50 would have still smaller variability. 

So the larger the sample size, the smaller the variability of the estimate, the smaller the margin of error, and the shorter the confidence interval.

Let's look at an example…

Example - sample of size 10

(Figure demonstrating a class of 100 students, and a sample of 10 students, with an estimated average grade of 64%, and the true class average of 73%.)

The average class grade is 73%.

The average for the random sample of 10 students is 64%.

Example - sample of size 50

(Figure demonstrating a class of 100 students, and a sample of 50 students, with an estimated average grade of  71%, and the true class average of 73%.)

As we see in this example, with a much larger sample size, the variability of the estimator is much smaller, and it would tend to be much closer to the true value. The confidence interval would then be narrower.  

Knowledge check

Now it's your turn. How would you interpret the following statement:

According to a recent study, adults living in a specific city weighed an average of 75 kg, with a margin of error of -/+ 10 kg, 9 times out of 10.

What is the estimated value? What is the confidence interval? What is the confidence level?

Take a moment to think about all the information included in this sentence.

Answer

First, we can conclude that the estimated value was obtained using a sample of the population. Second, we understand that the estimated average weight is 75 kg, and that the confidence interval ranges from 65 kg to 85 kg. The confidence interval is quite large, which may suggest a small sample size, high variability in the weight of individuals, or even both.

The confidence level is 90%, or 9 times out of 10. This means that if a random sampling were to be repeated many times, the confidence interval would contain the true value 9 times out of 10. A higher confidence level, 95%, as an example, would require an even wider confidence interval.

Recap of key points

To summarize what we learned today: confidence intervals can help understand and measure the uncertainty associated with estimated values from samples; data coming from samples do not provide true values, but estimated values; the length of the confidence interval can vary based on the size of the sample, the variability of the population and the confidence level required.

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.