Numerical Summaries of Data
Introduction
What you’ll learn to do: Summarize a set of numerical data by reporting various measurements
It is often desirable to use a few numbers to summarize a data set. One important aspect of a set of data is where its center is located. In this lesson, measures of central tendency are discussed first. A second aspect of a distribution is how spread out it is. In other words, how much the data in the distribution vary from one another. The second section of this lesson describes measures of variability.Learning Outcomes
- Calculate the mean, median, and mode of a set of data.
- Calculate the range of a data set.
Measures of Central Tendency: Mean, Median, and Mode
Let's begin by trying to find the most "typical" value of a data set. Note that we just used the word "typical" although in many cases you might think of using the word "average." We need to be careful with the word "average" as it means different things to different people in different contexts. One of the most common uses of the word "average" is what mathematicians and statisticians call the arithmetic mean, or just plain old mean for short. "Arithmetic mean" sounds rather fancy, but you have likely calculated a mean many times without realizing it; the mean is what most people think of when they use the word "average."Mean
The mean of a set of data is the sum of the data values divided by the number of values.examples
Example 1: Marci’s exam scores for her last math class were 79, 86, 82, and 94. What would the mean of these values be?Answer: [latex]\frac{79+86+82+94}{4}=85.25[/latex]. Typically we round means to one more decimal place than the original data had. In this case, we would round 85.25 to 85.3.
Example 2: The number of touchdown (TD) passes thrown by each of the 31 teams in the National Football League in the 2000 season are shown below. 37 33 33 32 29 28 28 23 22 22 22 21 21 21 20 20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6 What is the mean number of TD passes?Answer: Adding these values, we get 634 total TDs. Dividing by 31, the number of data values, we get 634/31 = 20.4516. It would be appropriate to round this to 20.5. It would be most correct for us to report that “The mean number of touchdown passes thrown in the NFL in the 2000 season was 20.5 passes,” but it is not uncommon to see the more casual word “average” used in place of “mean.”
Both examples are described further in the following video. https://youtu.be/3if9Le2sO0cTry It
[ohm_question]7045[/ohm_question]examples
Example 1: The one hundred families in a particular neighborhood are asked their annual household income, to the nearest $5 thousand dollars. The results are summarized in a frequency table below.Income (thousands of dollars) | Frequency |
15 | 6 |
20 | 8 |
25 | 11 |
30 | 17 |
35 | 19 |
40 | 20 |
45 | 12 |
50 | 7 |
Answer: Calculating the mean by hand could get tricky if we try to type in all 100 values: [latex-display]\frac{\overbrace{15+\cdots+15}^{\text{6 terms}}+\overbrace{20+\cdots+20}^{\text{8 terms}}+\overbrace{25+\cdots+25}^{\text{11 terms}}+\cdots}{\text{100}}[/latex-display] We could calculate this more easily by noticing that adding 15 to itself six times is the same as 90. Using this simplification, we get [latex-display]\frac{15\cdot6+20\cdot8+25\cdot11+30\cdot17+35\cdot19+40\cdot20+45\cdot12+50\cdot7}{\text{100}}=\frac{3390}{100}=33.9[/latex-display] The mean household income of our sample is 33.9 thousand dollars or $33,900.
Example 2: Extending the last example, suppose a new family moves into the neighborhood and this new family has a household income of $5 million ($5000 thousand). What is the new mean of this neighborhood's income?Answer: Adding this to our sample, our mean is now: [latex-display]\frac{15\cdot6+20\cdot8+25\cdot11+30\cdot17+35\cdot19+40\cdot20+45\cdot12+50\cdot7+5000\cdot1}{\text{101}}=\frac{8390}{101}=83.069[/latex-display] The new mean household income of our sample would be 83.069 thousand dollars or $83,069.
Both situations are explained further in this video. https://youtu.be/1_4Hxcq8DpQMedian
The median of a set of data is the value in the middle when the data is in order.- To find the median, begin by listing the data in order from smallest to largest, or largest to smallest.
- If the number of data values, N, is odd, then the median is the middle data value.
- If the number of data values is even, there is no one middle value, so we find the mean of the two middle values.
example
Example 1: Returning to the football touchdown data, we would start by listing the data in order. Luckily, it was already in decreasing order, so we can work with it without needing to reorder it first. 37 33 33 32 29 28 28 23 22 22 22 21 21 21 20 20 19 19 18 18 18 18 16 15 14 14 14 12 12 9 6 What is the median TD value?Answer: Since there are 31 data values, an odd number, the median will be the middle number, the 16th data value (leaving 15 values below and 15 above). The 16th data value is 20, so the median number of touchdown passes in the 2000 season was 20 passes. Notice that for this data, the median is fairly close to the mean we calculated earlier, 20.5.
Example 2: Find the median of these quiz scores: 5, 10, 8, 6, 4, 8, 2, 5, 7, 7Answer: We start by listing the data in order: 2 4 5 5 6 7 7 8 8 10 Since there are 10 data values, an even number, there is no one middle number. The two middle numbers are 6 and 7. To find the median, we find the mean of these two middle numbers, and get (6+7)/2 = 6.5. The median quiz score is 6.5.
Learn more about these median examples in the following video. https://youtu.be/WEdr_rSRObkTry It
[ohm_question]7039[/ohm_question]Example
Let us return now to our original household income dataIncome (thousands of dollars) | Frequency |
15 | 6 |
20 | 8 |
25 | 11 |
30 | 17 |
35 | 19 |
40 | 20 |
45 | 12 |
50 | 7 |
Answer: Here we have 100 data values. If we didn’t already know that, we could find it by adding the frequencies. Since 100 is an even number, we need to find the mean of the middle two data values - the 50th and 51st data values. To find these, we start counting up from the top: There are 6 data values of $15, so Values 1 to 6 are $15 thousand The next 8 data values are $20, so Values 7 to (6+8)=14 are $20 thousand The next 11 data values are $25, so Values 15 to (14+11)=25 are $25 thousand The next 17 data values are $30, so Values 26 to (25+17)=42 are $30 thousand The next 19 data values are $35, so Values 43 to (42+19)=61 are $35 thousand From this we can tell that values 50 and 51 will be $35 thousand, and the mean of these two values is $35 thousand. The median income in this neighborhood is $35 thousand. This is fairly close to our mean which was $33,900.
Now, let's add in the new neighbor with a $5 million household income. Then there will be 101 data values, and the 51st value will be the median. As we discovered in the last example, the 51st value is still $35 thousand. Notice that the new neighbor did not affect the median in this case. Remember that the mean in this version of the example was $83,069. The median is not swayed as much by outliers as the mean is. View more about the median of this neighborhood's household incomes here in this video. https://youtu.be/kqEu9EDkmfUMode
The mode is the element of the data set that occurs most frequently. Note that it is possible for a data set to have more than one mode if several categories have the same frequency, or no modes if each category occurs only once.Example
In our vehicle color survey earlier in this module, we collected the following data:Color | Frequency |
---|---|
Blue | 3 |
Green | 5 |
Red | 4 |
White | 3 |
Black | 2 |
Grey | 3 |
Answer: For this data, Green is the mode, since it is the data value that occurred the most frequently.
Mode in this example is explained by the video here. https://youtu.be/pFpkWrib3JkTry It
[ohm_question]38040[/ohm_question]Licenses & Attributions
CC licensed content, Original
- Revision and Adaptation. Provided by: Lumen Learning License: CC BY: Attribution.
CC licensed content, Shared previously
- Measures of Central Tendency. Authored by: David Lippman. Located at: http://www.opentextbookstore.com/mathinsociety/. License: CC BY-SA: Attribution-ShareAlike.
- Magnetic. Authored by: Philippe Put. Located at: https://www.flickr.com/photos/ineedair/8027247398/. License: CC BY: Attribution.
- Finding the mean of a data set. Authored by: OCLPhase2's channel. License: CC BY: Attribution.
- Mean from a frequency table. Authored by: OCLPhase2's channel. License: CC BY: Attribution.
- Median from a data list. Authored by: OCLPhase2's channel. License: CC BY: Attribution.
- Median from a frequency table. Authored by: OCLPhase2's channel. License: CC BY: Attribution.
- Mode for categorical data. Authored by: OCLPhase2's channel. License: CC BY: Attribution.