Saturday, 21 January 2012

Standard Deviation

Standard Deviation

Standard devation is a test that measures dispersion in a data set and the reliabilty of the mean value, as it measures the spread of data around the mean. It is used for measurements of variability/diversity in a data set.

A normal distribution means that most of the values in a data set are close to the average value and few results tend to one extreme; the mean value is representative of the data set and therefore reliable. If data is very clustered around the mean the bell-shaped curve of a normal distribution wll be steep, and the standard deviation value will be small. If data has a large spread with more extreme values the bell-shaped curve will be flatter, and the standard deviation value large.

To calculate standard deviation it is best to put data into a table:

Year
Rainfall (X)
X- mean
(X-mean) 2
1
389
-321.17
103150.17
2
786
75.83
5750.19
3
990
279.83
78304.83
4
1195
484.83
235060.13
5
485
-225.17
50701.53
6
4293
582.83
339690.81
7
531
-179.17
32101.89
8
372
-338.17
114358.95
9
421
-289.17
83619.29
10
983
272.83
74436.21
11
384
-326.17
106386.87
12
693
-17.17
294.81


8522

1223855.68


Mean = 710.17
  1. Add all the numbers in the X column up, then divide by the number of results to find the mean. (The X-bar symbol for the mean cannot be typed on this page for some reason, see post on "measures of central tendency" to see what the symbol for the mean looks like.)
  2. Once the mean is calculated take it away from each value individually
  3. Square the result to remove any negative numbers
  4. Then add up all the reults in the final column (the ∑ symbol means sum of)
  5. Once this has been done you need to use the following equation                    
  6. You already have the result for the top line, then simply divide it by the number of results (n) and square root the answer: the result for this data set is 319.36.
Interpreting the value can be slightly hard. One standard deviation away from the mean accounts for 68% of the results; in this example it means that 68% of the results are either 319.36 above or below the mean, ie 68% of the results are between the values of 390.81 - 1029.53. Two standard deviations away from the mean accounts for 95% of the data, and three standard deviations accounts for 99% of the data.
We can see that this figure is quite large, and that there is a large range in the values of data at even one standard deviation. Therefore there is a large spread of data in the set, and the mean is not very representative.

Standard deviation is very useful for comparing data sets that have the same or similar means; as two data sets can have the same mean but vey different standard deviation values showing a very different spread of results. Standard deviation takes into account all the actual results of a data set and provides further information about the data than just the mean; however because the test is based on the mean it is affected by extreme values in a data set.

No comments:

Post a Comment