Showing posts with label mean. Show all posts
Showing posts with label mean. Show all posts

Thursday, September 16, 2010

Median

Median is the middle point in the data set. An equal number of items are below and above this value.

The dataset must be ordered before the median can be determined.

The number of items (n) will determine where the median is located.

(n+1)/2 = median rank

For example:

1 , 1 , 2 , 4 , 6 , 2 , 9 , 3 , 7 , 5 , 2 , 5 , 9 , 6

Ordered:
1 , 1 , 2 , 2 , 2 , 3 , 4 , 5 , 5 , 6 , 6 , 7 , 9 , 9

Total count:
n=14

Median rank = (14+1)/2=7.5

Since this dataset has an even number of items (n=14) then the median is found between the 7th and 8th position. The 7th value is 4, and the 8th value is 5. The value inbetween is (4+5)/2=4.5. The median is 4.5. There are 7 items above and below this value.

For datasets with odd number n, the median falls exactly on the median rank.

To illustrate:

1 , 1 , 2 , 4 , 6 , 2 , 9 , 3 , 7 , 5 , 2 , 5 , 9 , 6, 4

Ordered:
1 , 1 , 2 , 2 , 2 , 3 , 4 , 4 , 5 , 5 , 6 , 6 , 7 , 9 , 9

Median rank = (15+1)/2 = 8

The value in the 8th position is 4, so the median is 4. There are 7 items above and below it.

The mean, median, and mode are all measures of central tendency. The skew can be determined by comparing these three measures.




Mean

The mean is the average value in the dataset.

It is calculated by adding up the data values (x), then dividing by the number of items (n).

The mean of a sample is traditionally labelled x-bar. The mean of a population is labelled µ (mu).

sum(x)/n = x-bar

For example, find the mean of the following sample dataset:

10
12
1
16
10
11
13
6
15
6

sum(x) = 10+12+1+16+10+11+13+6+15+6 =100

n=10

x-bar = 100/10 = 10

The mean is 10.

It is also the "center" of the data - in the sense that the difference of each value from the mean will sum up to zero. This is because there are equal positive differences as there are negative.

Check this, using the above example:

10 - 10 = 0
12 - 10 = 2
1 - 10 = -9
16 - 10 = 6
10 - 10 = 0
11 - 10 = 1
13 - 10 = 3
6 - 10 = -4
15 - 10 = 5
6 - 10 = -4


0 + 2 + -9 + 6 + 0 + 1 + 3 + -4 + 5 + -4 = 0

The mean, median, and mode are all measures of central tendency. The skew can be determined by comparing these three measures.

Tuesday, September 14, 2010

Mode

Mode is the value in a dataset that appears the most frequently.

For example:

In the following the sample, the mode is 5

1
1
2
5
6
2
9
5
7
5
2
5
9
5

Count the number of times 5 appears. It appears the most, so it is the mode.

Some datasets have more than one mode.

If there is a single mode, the term 'unimodal' is used. The example above is unimodal. There are five 5's. Had there also been five 2's, than the example is no longer unimodal. Then, both five and two would be called modes.

The mean, median, and mode are all measures of central tendency. The skew can be determined by comparing these three measures.

Thursday, July 22, 2010

Variance

Variance represents how spread out the data are. It is the average of the squared differences from the mean.

The distances from the mean are calculated by subtracting each x from the mean. These distances are squared and then averaged to arrive at the variance.

Because the differences are squared, the result is in squared units - for example, if the measurements are "miles," then the variance is "miles^2". Therefore, the variance value does not intuitively describe the data. To overcome this, the square-root of the variance is taken. The square-root of variance is called standard deviation.

Here is an example data set:

miles driven (x): 43, 70, 27, 36
n = 4
mean = 44 miles

differences
43 - 44 = -1
70 - 44 = 26
27 -44 = -17
36 - 44 = -8

differences^2
(-1)^2 = 1
(26)^2 = 676
(-17)^2 = 289
(-8)^2 = 64

The average of the differences^2:
(1+676+289+64)/4 = 257.5

The variance is:
257.5 miles^2