Chi-Squared Test
The Chi-squared test has the capability to become very complicated, and can be completed in several ways. I am going to concentrate on the most simple form of the test, which you are most likely to come across at A-level.
This is a test that is used for categorised data; because it uses categories rather than continuous data it doesn't require a normal distribution and is therefore neither parametric or non-parametric.
Chi-squared is used to test if there is a significant difference between the expected frequencies and the actual observed frequencies in one or more categories. It can then assess if any difference between the expected and observed frequencies is due to sampling error and chance or is a significant difference that can be investigated further. This test is sometimes called the Chi Squared goodness of fit test, as it can be used to assess the "goodness of fit" of an observed distribution to a theoretical one.
Calculating Chi-Squared:
The ratio of males to females in a science faculty is 1:1, but in the chemistry lectures there have been 80 females and 40 males. Is this a signifcant difference in number from what is expected?
(You can suggest from the figures that it is, but it will not always be so clear!)
You first need a table of the observed values:
Female
|
Male
| |
Observed values
|
80
|
40
|
Then the expected values need to be caluculated: the expected values are always the number you would expect in a random distribution. They can be worked out in two ways:
- You predict that all the values in the categories will be the same. This is calculated by dividing the total of all the categories by the number of categories, in the example (80+40=) 120/2 meaning 60 in each category.
- You determine the expected frequencies on some prior knowledge. Eg suppose we alter the question for this example and have the knowledge that 30% of the faculty were males and 70% females; the expected values would now be 36 males and 84 females.
Female
|
Male
|
Total
| |
Observed values (O)
|
80
|
40
|
120
|
Expected values (E)
|
60
|
60
|
120
|
0-E
|
20
|
-20
|
0
|
(O-E)2
|
400
|
400
| |
(O-E)2/E
|
6.67
|
6.67
|
13.34
|
Note that the total of the observed values and expected values must always be the same. The total of the 0-E column must always be 0.
Chi-squared equation is (O-E)2/E, so the chi squared value for this example is 13.34.
Again this value means nothing on its own and must be tested in significance tables; here n-1 is the degrees of freedom and the chi squared value must be greater than the critical value stated in the table. If it is greater than the critical value it means that there is a significant difference between the observed and expected frequencies. Eg something is causing more females to attend the chemistry lectures than males or causing the males to miss these lectures.
No comments:
Post a Comment