Mann-Whitney U test
This is a test that compares the medians of two data sets, to see if there is a significant difference between the data sets. It shows if one of the samples tends to have large values than the other, and therefore shows if there is a difference between the data sets or if any perceived difference is simply due to chance. Two data sets can have similar means but very different values within them, this test therefore highlights differences within the data sets; it can also be used to show that two independent samples do in fact have the same distributions.
This is a non-parametric test, meaning data doesn't have to be normally distributed; the parametric alternative is the Students T-test which is a much more complicated analysis. The U test uses relative ranks of data, so the data must be ordinal.
Before the test is completed a null hypothesis must be established, this is always the same and states:
There is no significant difference between the two data setsThis is the hypothesis that you aim to disprove (or prove if you want to show that the samples are indeed showing the same distributions/ from the same populations).
Completing the test:
For example if you wanted to see if there was a difference in traffic flow before and after a supermarket was built. You could collect data before the costruction and after the construction, then use the Mann-Whitney U test to see if the difference in traffic flow is significant or not.
The data first needs to be put into a table, with the two data sets being labelled A and B (you can use any letters you like as long as they are constant throughout the analysis).
Traffic Flow before construction (A)
|
Rank (ra)
|
Site Number
|
Traffic Flow after construction (B)
|
Rank (rb)
|
126
|
11
|
1
|
194
|
2
|
148
|
7
|
2
|
128
|
10
|
85
|
15.5
|
3
|
69
|
18
|
61
|
19
|
4
|
135
|
9
|
179
|
4
|
5
|
171
|
5
|
93
|
12.5
|
6
|
149
|
6
|
45
|
20
|
7
|
89
|
14
|
189
|
3
|
8
|
248
|
1
|
85
|
15.5
|
9
|
79
|
17
|
93
|
12.5
|
10
|
137
|
8
|
∑ra
|
120
|
∑rb
|
90
|
- The two samples must be ranked together; in Mann-Whitney data is always ranked Highest to Lowest values, ie the largest value is ranked 1. If there are any ties between values the average of the values is used for all of those tied, eg rank 15 and 16 both equal 85 so both have the value 15.5, if rank 15,16 and 17 all had the same values all would be ranked as 16.
- Next you must figure out what Ua and Ub are respectively. This uses two formulas: n is the number in the sample.
- To calculate Ua first multiply the number in sample A by the number in sample B, eg 10x10. Then complete the next part of the equation which for this example is 10x (10+1) / 2 which equals 55. So we now have 100 + 55, which equals 155 and we then minus ∑ra (which is the sum of the ranks in column A). 155- 120 = 135, meaning Ua =35.
- The value of Ub must then be calculated using the same formula but substituting the values for b. For this example Ub = 65.
- The U values on there own are meaningless and tell us nothing about the data, istead we must test them in significance tables. For this we only use the smallest of the U values, for this example we use Ua (35).
- The smallest U value then needs to be put into a critical value table to test its significance; the table will have a row of numbers on the top and a column of numbers down the side. You should use the number of results in sample A (eg 10) for one column and the number of results in sample B (eg 10) for one row. Find where this column and row intersect and this is your critical value. Eg 1234678910111213141512345678910231112131415
- The U value is significant at the 0.05 level if it is smaller than the critical value in the table; this means that there is a difference between the two samples, with only a 5% probability that the results are due to chance.
No comments:
Post a Comment