High school Mathematics Lessons: Chapter 37 - Statistics

In the previous section we completed a discussion on Probability. In this chapter we will see some topics in Statistics.

• We have already discussed the basic details of statistics.
♦ Statistics part I consists of chapters 1, 1.1, 1.2, . . . up to 1.5
♦ Statistics part II consists of chapters 25, 25.1, . . . up to 25.10
• Our present discussion is a continuation from part II.
• So the reader might want to revisit parts I and II thoroughly before taking up the present discussion.
■ Let us write a summary of what we have seen so far:
• Bar graphs and histograms
• Ungrouped frequency distribution tables and Grouped frequency distribution tables
♦ Ungrouped frequency distribution tables are sometimes simply called as 'Frequency distribution tables
• Small data → Ungrouped frequency distribution tables → bar graphs
• Large data → Grouped frequency distribution tables → histograms
• Mean, median and mode from ungrouped frequency distribution tables

• In this section we will learn the method to find the mean from grouped frequency distribution tables
• Let us recall how we calculated mean from a ungrouped frequency distribution table (Details here).

An example:
The marks obtained by 30 students of Class X of a certain school in a Mathematics paper consisting of 100 marks are presented in table 37.1 below:

Table.37.1

Find the mean of the marks obtained by the students.
Solution:
1. We have:

2. Table 37.2 below prepared by modifying table 37.1. In this new table, (f_ix_i) is also added.

Table.37.2

• The numerator is calculated at the bottom end of third column. It's value is 1779
• The denominator is calculated at the bottom end of second column. It's value is 30
3. So we get x = ¹⁷⁷⁹⁄₃₀ = 59.3
• So the mean of the marks is 59.3

• We have seen in the previous discussions that, when a large amount of data is given to us, ungrouped frequency distribution tables are inconvenient.
♦ We have seen that, in such cases we use grouped frequency distribution tables.
• So let us convert the given table in example 1 above, into a grouped frequency distribution table.
The steps are given below:
1. Let the width of the class-intervals be 15.
• The resulting table based on this width is shown as table 37.3 below:

Table.37.3

2. Now, each class-interval has two points: an upper limit and a lower limit.

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({
          google_ad_client: "ca-pub-8210415193261531",
          enable_page_level_ads: true
     }); 

• </script>For each class-interval, we want a single point which can be used to represent that class-interval
■ The exact midpoint between the upper limit and the lower limit is taken as the representative point
• This representative point is called the class mark
• The class mark, which is the exact midpoint can be easily obtained using the formula:
Class mark = ^{(Upper limit + Lower limit)}⁄₂
• It is just like finding the 'coordinates of the midpoint between two given points' in Coordinate geometry.
• An example:
The class mark of the class interval 25-40 is ^(25+40)⁄₂= ⁶⁵⁄₂ = 32.5
3. Once we get the class mark (which represents a class-interval), we can multiply it by the frequency of that class-interval
• The products (f_ix_i) thus obtained are tabulated in the fourth column of table 37.4 below:

Table.37.4

4. Once the table is prepared, we can use the formula:

• The numerator is calculated at the bottom end of fourth column. It's value is 1860
• The denominator is calculated at the bottom end of second column. It's value is 30
3. So we get x = ¹⁸⁶⁰⁄₃₀ = 62
• So the mean of the marks is 62
• This new method of finding the mean is known as the Direct Method

We did the same problem by two methods:
• In the first method we found x by using the data in a ungrouped frequency distribution table
♦ We got x̄ = ¹⁷⁷⁹⁄₃₀ = 59.3
• In the second method we found x by using the data in a grouped frequency distribution table
♦ We got x̄ = ¹⁸⁶⁰⁄₃₀ = 62
• So what happened? Why are the answers different?
Ans: The difference occurred because of the 'assumption that the class mark is at the midpoint'.
It can be explained with the help of an example:
(i) In the class interval 40-55, there are 7 values. They are:
40 (4 Nos.) & 50 (3 Nos.)
(ii) Their average would be: ^{[(40×4)+(50×3}^)]⁄₇= ^[(160)+(150^)]⁄₇ = ^[310^]⁄₇ = 44.29
(iii) But we chose the class mark 47.5 to represent those seven values
• This error occurs in all the class intervals
• So if we want the exact value of x, we must find the exact representative for each class interval
• But then the calculations will become lengthy
♦ The whole 'convenience of using the grouped frequency distribution table' will be lost
■ However we must call the two results by two different names:
• The x calculated by using the ungrouped frequency distribution table is called exact mean
• The x calculated by using the grouped frequency distribution table is called approximate mean

Now we will see a simplified method. Let us write the steps:
1. Subtract a fixed number from the x_i's.
• This fixed number is the assumed mean and is denoted as a
• It is better to use the middle value for subtraction.
♦ We will soon see that the results will be 'symmetric' if we choose the middle value
• The values of x_i's that we have are:
• 17.5, 32.5, 47.5, 62.5, 77.5 and 92.5
• There are 6 values. So 47.5 and 62.5 comes in the middle. Let us choose 47.5 as 'a'
• This 47.5 must be subtracted from 17.5, 32.5, 47.5, 62.5, 77.5 and 92.5
• The result of each subtraction is known as the 'deviation of a from that x_i'. It is denoted as d
• The d's are tabulated in the fourth column in table 37.5 below:

Table.37.5

2. Find the product (f_id_i). They are tabulated in the fifth column
• Once the table is complete, we can calculate d̄, which is the 'mean of the deviations'.
• It is given by the formula:

• Thus we get: d = ⁴³⁵⁄₃₀ = 14.5

■ Let us compare the formulae for x and d:
• While calculating x, we have Σf_ix_i in the numerator
• While calculating d, we have Σf_id_i in the numerator
♦ The denominator is same in both cases
• Obviously f_id_iis easier to calculate than f_ix_i because, d_iis smaller than x_i.

3. We have calculated d. But our aim is to find x.
From d, we can easily reach x. The steps are shown below:

• So we can write:
To get x, we simply add a and d.
• Thus in our problem, x = 47.5 + 14.5 = 62
■ This method of calculating the mean is called: Assumed mean method.

• In the above method, we chose 47.5, which is near the middle, as the assumed mean 'a'
■ Now a question arises:
Should we always choose 'a value near the middle' to become 'a'?
• To find the answer, let us do all the calculations again, using another 'a'.
• Instead of 47.5, the let us take the first value 17.5
• The calculations are shown in table.37.6(a) below:

Tables.37.6 (a) & (b)

• d is given by the formula:

• The numerator is calculated at the bottom end of fifth column in table (a). It's value is 1335
• The denominator is calculated at the bottom end of second column. It's value is 30
• So we get d = ¹³³⁵⁄₃₀ = 44.5
• Thus x = a + d = 17.5 + 44.5 = 62
■ This is the same value obtained when we used 47.5 as 'a'
• Instead of 47.5, the let us take the second value 32.5
• The calculations are shown in table.37.6(b) above.
• d is given by the formula:

• The numerator is calculated at the bottom end of fifth column in table (b). It's value is 885
• The denominator is calculated at the bottom end of second column. It's value is 30
• So we get d = ⁸⁸⁵⁄₃₀ = 29.5
• Thus x = a + d = 32.5 + 29.5 = 62
■ This is the same value obtained when we used 47.5 as 'a'
• Instead of 47.5, the let us take the fourth value 62.5
• The calculations are shown in table.37.7(a) below:

Tables.37.7 (a) & (b)

• d is given by the formula:

• The numerator is calculated at the bottom end of fifth column in table (a). It's value is -15
• The denominator is calculated at the bottom end of second column. It's value is 30
• So we get d = ^-15⁄₃₀ = -0.5
• Thus x = a + d = 62.5 + (-0.5) = 62.5 - 0.5 = 62
■ This is the same value obtained when we used 47.5 as 'a'

• Instead of 47.5, the let us take the fifth value 77.5
• The calculations are shown in table.37.7(b) above.
• d is given by the formula:

• The numerator is calculated at the bottom end of fifth column in table (b). It's value is -465
• The denominator is calculated at the bottom end of second column. It's value is 30
• So we get d = ^-465⁄₃₀ = -15.5
• Thus x = a + d = 77.5 + (-15.5) = 77.5 - 15.5 = 62
■ This is the same value obtained when we used 47.5 as 'a'

• Instead of 47.5, the let us take the sixth value 92.5
• The calculations are shown in table.37.8 below:

Table.37.8

• d is given by the formula:

• The numerator is calculated at the bottom end of fifth column in table (a). It's value is -915
• The denominator is calculated at the bottom end of second column. It's value is 30
• So we get d = ^-915⁄₃₀ = -30.5
• Thus x = a + d = 92.5 + (-30.5) = 92.5 - 30.5 = 62
■ This is the same value obtained when we used 47.5 as 'a'
• So it is obvious that, we can use any value from the (x_i) column as the 'assumed mean' a

■ But consider the table 37.5 above, where we used the 47.5
• In the fourth column of that table, we have a sort of 'vertical symmetry'
• The negative values above zero will cancel with the positive values below zero.
• So the calculations become easier

So we completed the discussion on the Assumed mean method. In the next section, we will see another method.

High school Mathematics Lessons

Pages

Sunday, March 4, 2018

Chapter 37 - Statistics - Part III

No comments:

Post a Comment