Saturday, 21 January 2012

Measures of Dispersion

Measuring Dispersion

The dispersion of a data set is simply the spread of data within the results; these measures are used to provide information about the spread of data around the mean and median values, and therefore used in conjunction with measures of central tendency
  • Range
The range is simply the difference between the lowest and highest figures in a data set. The range is therefore a crude indication of the spread of the data, it is very quick to calculate but is easily skewed by extreme values/ anomalies.
  • Interquartile Range
This removes the top and bottom quarters of the results and shows the dispersion of the central 50% of results, it therefore removes, and remains unaffected by, extreme values. It is best used with the median, and a higher interquartile range means that there is a greater spread of results about the median and vice versa. Although this result is not skewed by extreme values, and therefore often more representative of the dispersion of a data set than the range, it doesn't take all values into account.
The interquartile range is caluculated in a similar way to the median. The data must first be ranked and then upper and lower quartiles calculated; where the range is the middle/half value, quartiles are the quarter values. Like the median the formulas give the position of each quartile, not the actual values.

If data is ranked lowest to highest the formulas are as follows:   (N is the number of values)
LQ= N+1 / 4
UQ= (N+1 / 4) x 3

If data is ranked highest to lowest the formulas are as follows:

LQ= (N+1 / 4) x 3
UQ= N+1 / 4

Notice the formulas are just the opposite way round; it doesn't really matter which way the data is ranked as long as you remember that the upper quartile value should be bigger than the lower quartile value. The upper quartile is the value 75% through the data, the median is 50% and the lower quartile 25%.
The Interquartile range (IQR) is the difference between UQ and LQ.

  • Standard deviation
This is also a measure of dispersion; it measures the spread of data about the mean rather than the median. It takes into account all results, but is affected by extreme values and anomalies.
See the post on Standard Deviation for further information.


No comments:

Post a Comment