Statistics 101: Exploring measures of central tendency

Catalogue number: 892000062020002

Release date: May 3, 2021 Updated: November 25, 2021

This video is intended for learners who want to acquire a basic understanding of the concept of central tendency, what it means and some key related methods used for the exploration of data. By the end of this video, you will understand the differences between three fundamental statistical concepts. First, the mean (also known as the average), then the median and finally, the mode.

Data journey step
Explore, clean, describe
Data competency
  • Data exploration
  • Data interpretation
  • Storytelling
Audience
Basic
Suggested prerequisites
N/A
Length
12:10
Cost
Free

Watch the video

Statistics 101: Exploring measures of central tendency - Transcript

Statistics 101: Exploring measures of central tendency - Transcript

(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Statistics 101 Exploring measures of central tendency)

Statistics 101: Exploring measures of central tendency

How do we describe data in just a few simple terms? Two really important features of a data set are the location of the center or balance point and the size of the spread. Try thinking of it this way. If we were to hold the data in our hands, would they be densely concentrated in one spot like a golf ball or all over the place like cotton candy? How big of a region the data cover, or, the variability or spread of the data is called dispersion. The central tendency is where the center of the data lies. In this video we will explore the concept of central tendency to learn more about dispersion, check out the video called "Exploring measures of dispersion".

Learning goals

By the end of this video you will understand the differences between three fundamental statistical concepts. First, the mean, also known as the average. Then the median, and finally the mode. This video is intended for learners who want to acquire a basic understanding of the concept of central tendency, what it means, and some key related methods used for the exploration of data. No previous knowledge is required.

Steps of a data journey

(Text on screen: Supported by a foundation of stewardship, metadata, standards and quality)

(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)

This diagram is a visual representation of the data journey from collecting the data to exploring, cleaning, describing and understanding the data. To analyzing the data and lastly to communicating with others the story the data tell.

Step 2: Explore, clean and describe

(Diagram of the Steps of the data journey with an emphasis on Step 2 - explore, clean, describe.)

Looking into measures of central tendency is a part of the explore, clean and describe step of the data journey.

Measure of central tendency

(Graph demonstrating a right skewed distribution where the mode is located at the summit. To the left of the summit is the median and the mean, respectively.)

To begin, what exactly does central tendency mean? Measures of central tendency describe the most representative value of the data in a single number. This is also called the balance point of a data set and is typically represented as the mean, the median and the mode. Let's see how these three measures are calculated.

The mean

The mean represents the average of all the values for one variable in a data set. The mean is established by adding up all values, then dividing that total by the number of values.

Calculating the mean

(Series of numbers that will make up the dataset where the numbers are 3, 4, 8, 5, 7, 3.)

Let's try using this data set as an example. To calculate the mean, first we sum all the values. Then we divide that sum by the number of values in the set. Here we see a data set containing six values. The sum of these six values is 30. So to calculate the mean, we divide that total of 30 by the number of values in the data set, which is 6. This leaves us with a mean or average value of 5.

It is important to note that the mean can be influenced by outliers or values that are extremely high or low compared with the rest of the values. In other words, an extremely high value in the data may cause the mean to increase to the point where it no longer represents the overall data. Notice here where the last value in the sum is 33 when on the previous slide it was three. This change could be accurate. But it could also be an error. We simply don't know. Regardless, it's important to note the effect that this outlier has on the mean. The sum of the six values is now 60, and the mean has doubled in size from 5 to 10. Therefore, it is important to check for outliers before deciding to choose the mean to measure central tendency.

The median

Another measure of central tendency is the median, which is defined as the middle value when all values are arranged in increasing order.

Calculating the median: For an odd number of values

(Series of numbers that will make up the dataset where the numbers are 5, 6, 7, 8, 8, 9, 9, 9, 12, 15, 21, 28, 33.)

Calculating the median when the data set has an odd number of values is straightforward. First we sort the values into increasing order. Then we count the values and find the one in the middle where half the values are above it and half below. That middle value is the median. In this example, the median is 9.

Calculating the median: For an even number of values

(Series of numbers that will make up the dataset where the numbers are 5, 6, 7, 8, 8, 9, 9, 9, 12, 15, 21, 28, 33, 35)

Next we take a look at when there is an even number of values in a data set. We place the values in order, count the values and locate the two middle numbers. The two where there is an equal number of values on either side of them. Then add those two middle values together and divide by two. In this example, we see the median is once again 9.

Extreme values and the median

(Two series of numbers that will make up the dataset where the numbers for dataset a are 5, 6, 6, 7, 8, 9, 9, 9, 12, 15, 21, 28, 33 and the numbers for dataset b are 5, 6, 6, 7, 8, 9, 9, 9, 12, 15, 21, 28, 333.)

A major difference between the mean and the median besides the methods to calculate and find them, is how they are influenced by extreme values. Unlike the mean, the median is not as affected by extreme values. Consider the data sets A and B, which are the same except that the largest value has been increased from 33 and A to 333 in B. Although only data set B contains an outlier, the median is still mine in both datasets, even if data set B were to contain one or two more outliers, nine would still be the median because it would remain as the middle value in the data set.

The mode

The third measure of central tendency. We'll talk about today is called the mode. The mode is the value that occurs most often in a data set. In other words, it is the most frequent data point in a data set. The mode is the simplest measure to determine. It is also not influenced by extreme values as the presence of extreme values does not change the value or values that occur most often.

Calculating the mode

(Series of numbers that will make up the dataset where the numbers are 6, 3, 9, 6, 6, 5, 9, 3.)

To find the mode, count how many times each number occurs. The number of the appears most often is the mode. Two interesting things to note about this measure of central tendency, however, are that a because the mode is the value that occurs most often in a data set. If all values have the same number of occurrences, there is no mode and B. If the highest number of occurrences is found more than once, then there is more than one mode. If that's the case, the mode may not be a good measure of central tendency.

Question

Now it's your turn. Look at the following numbers 1, 1, 1, 1, 1, 4 and 5. Take a moment to determine the mean, median and mode. You'll find the answers on the next slide.

Answer

To calculate the mean, we sum the digits 1st 1 + 1 + 1 + 1 + 1 + 4 + 5 = 14. There are seven numbers, so the mean is 14 divided by 7 which equals 2. The median is the middle number in the data set. Conveniently, the numbers are already in numerical order. There are three 1s on the left, one 4 and 5 on the right, leaving a 1 in the middle. Therefore the median is 1. There are five 1s, but only one 4, and one 5. Therefore, the mode is also 1. A number of software packages, including Excel, have built-in functions for calculating the mean, median, and mode. The median often falls between the meat and the mode. But this is not always the case, as demonstrated by this example.

Top tips: mean, median and mode

(Note at bottom of slide which says it can be useful to look at more than one measure of central tendency.)

These tips are meant to help you decide which measure of central tendency to use in different situations. The first tip is to be aware that when the data are not numerical, such as city names, it's not possible to calculate a mean or median, so the mode may be of interest. Next, if there are extreme values in the data, the median is more representative than the mean. And finally. When there is more than one mode in a data set. The mode may not be the best measure of central tendency.

Question

(Graph demonstrating a right skewed distribution with a vertical line labelled x at the summit. To the right of the summit is a line labelled y and to the right of y, is a third line labelled z.)

For this knowledge check, let's practice our understanding of the central tendencies so far. In the graph on the right, the salaries of hockey players are displayed on the horizontal axis and the number of players is displayed on the vertical axis. In this distribution, what measures of central tendency are represented by the lines X, Y and Z?

Answer

Were you able to determine which was which? The salary that the greatest number of players earn is X. Therefore X is the mode of this distribution. The curve is not symmetrical. It has what we call a tail to the right, which means that there are a few hockey players who were in a very high salary. These values are pulling the average upwards, so Z is the mean or average value. Y is approximately where half of the players are below that value in half or above? Therefore, Y is the median?

Questions

(Same graph is used again. It demonstrates a right skewed distribution with a vertical line labelled x at the summit. To the right of the summit is a line labelled y and to the right of y, is a third line labelled z.)

Now we'll use a real world example to depict when someone might elect to use one measure of central tendency over another. Let's pretend you represent the owners of the National Hockey League. Which measure of central tendency will help you make the case that the players salaries are too high? If you were the players Union representative. Which measure of central tendency will help you make the case that player salaries are not too high? Which measure of central tendency is likely to provide the best representative of player salaries?

Answers

If you represent the owners and you think player salaries are too high, you would use the mean as it is the highest value and you would hope that no one points out that the mean can be influenced by extreme values. If you represent the players union, you would focus on the mode, which is the lowest salary. You could argue more players have this salary than any other salary. However, it would not be true to say that this value is representative of player salaries as a whole, because the mode is only the most common value, while collectively many players are much more than that salary. Here, the median is the best representation of player salaries, because half the players make more than this value and half make less.

Summary of key points

To summarize what we learned today. Central tendency is the formal term we use when we are referring to a single way of determining the center or balance point of a data set. We looked at three different ways to calculate the central tendency. The mean or average is probably the most well known. However, we learned that extreme values can influence the average. We also learned about the median or middle point where half of the values are below it in half or above. The median is less likely to be influenced by extreme values. The third measure of central tendency we learned about was the mode. Which is the most common value? It is important to remember that a data set can have more than one mode or none at all.

Further learning

To find out more about the spread or dispersion of data, check out the video called "Exploring measures of dispersion."

(The Canada Wordmark appears.)

What did you think?

Please give us feedback so we can better provide content that suits our users' needs.