Correlation Coefficients
Correlation
When two things vary together there is said to be a correlation between them; correlations are often shown by a line of best fit on a scatter graph.
A positive correlation occurs when an increase in one variable results in an increase in another variable, on a graph it is a diagonal line SW to NE across the page.
A negative correlation occurs when an increase in on variable results in a decrease in the other variable, on a graph it is a diagonal line NW to SE across the page.
No correlation is present when a line of best fit cannot be drawn on the graph.
Note that a correlation does not mean a causal relationship between two variables, ie an increases in one causes an increase in the other, it simply suggests a link between the two that can be investigated further.
Spearman's Rank Correlation Coefficient (Rs)
This is a statistical test used to analyse correlation between two variables; the spearman rank value measures how strong a correlation is and in what direction it is.
This test is a non-parametric test, that can be used with data that is not normally distributed, but can only analyse variables that have a linear relationship (which can be shown by a scatter graph). The data must be ordinal, ie it can be ranked in order.
The data must first be put into a table:
Site
|
Discharge
(Variable 1)
|
Rank
|
Velocity
(Variable 2)
|
Rank
|
d
|
d2
|
1
|
0
|
5
|
0.05
|
5
|
0
|
0
|
2
|
100
|
4
|
0.19
|
3
|
1
|
1
|
3
|
200
|
3
|
0.18
|
4
|
-1
|
1
|
4
|
300
|
2
|
0.34
|
1
|
1
|
1
|
5
|
400
|
1
|
0.28
|
2
|
-1
|
1
|
∑
|
4
|
- The results of the two variables are ranked highest to lowest seperately (it doesn't actually matter how you rank them as long as you rank the two variables the same way).
- "d" is then calculated by taking the the second rank away from the first rank at each site
- The value for d is squared to remove any negative values, and the sum of this column is calculated
- The equation is then applied
- You already have the result to the top line. On the bottom line "n" is the number of pairs of data, ie 5. Then don't forget to take your result away from one. For this data set Rs= 0.8.
Interpreting the value is relatively easy. Your Rs value should always be between -1 and +1, if it isnt you have done something wrong; a value of -1 means a strong negative correlation, a value of +1 means a strong positive correlation and a value of 0 means no correlation at all. The closer to +/- 1 your value is the stronger the correlation for your data set.
Significance
The result cannot however be used as evidence, or to disprove a null hypothesis, unless it is statistically significant. Your Rs value must be inputted into a spearman rank significance table against the degrees of freedom for your data (this is the number of pairs of data you have minus 1). The higher your degrees of freedom, ie the more results you have the more likely your result is to be significant.
The result of 0.8 suggests a fairly strong positive correlation, but because of the small amount of data collected it is not significant so cannot be used for anything.
Pearson's Product Moment Correlation Coefficient
This is a more accurate test for correlation than Spearmans Rank because it uses actual values in the data set, rather than relative ranks.
This is a parametric test, so can only be used for data that shows a normal distribution, and has a much more complicated equation than Spearman Rank.
(I am unsure exactly how to work this value out; but for exams you are likely just to need to know the pros and cons as listed above.)
No comments:
Post a Comment