Statistician, St. Louis MO: standard deviation

Showing posts with label standard deviation. Show all posts

Tuesday, September 14, 2010

Sample Size

The size of a sample influences the cost of a study, as well as the usefulness of the results. A sample that is too small can exclude information. One too large is costly and cumbersome.

Often, researchers need to know the smallest sample that can be taken and yet still have estimates that are accurate.

Decision-makers first agree to the amount of error they will tolerate from the results. This is called the margin of error (E).

Along with margin of error, researchers also assign a critical value (C.V.) that is based upon the probability for extreme values in the population.

These two factors are combined with knowledge about the population's standard deviation (sigma) to reach a recommmended sample size.

n= [(C.V. * sigma) / E]^2

In order to apply the Central Limit Theorem, the common rule of thumb is a minimum sample size of 30. However, if the population is bell-shaped, it can be smaller.

Friday, July 23, 2010

Margin of Error

Margin of Error (E) is the error that can be tolerated when estimating a value.

For confidence intervals, it is calculated as the critical value multiplied by the standard error -

E = Crit Val * Std Err

First, you look up the critical value from the probability table (t or z), then you calculate the standard error. Multiply these together.

Margin of Error tells you how much 'cushion' to place on your estimated value.

This cushion will be larger or smaller depending on the critical value that the researcher has chosen.

However, to determine sample size (n), the margin of error is chosen, not calculated.

For example, a buyer wants to know the sample size needed to estimate the average cost of shoes. He needs the estimate to be within ten dollars of the true population mean.

In this case, you will use E=10 in the formula for solving sample size.

Thursday, July 22, 2010

Variance

Variance represents how spread out the data are. It is the average of the squared differences from the mean.

The distances from the mean are calculated by subtracting each x from the mean. These distances are squared and then averaged to arrive at the variance.

Because the differences are squared, the result is in squared units - for example, if the measurements are "miles," then the variance is "miles^2". Therefore, the variance value does not intuitively describe the data. To overcome this, the square-root of the variance is taken. The square-root of variance is called standard deviation.

Here is an example data set:

miles driven (x): 43, 70, 27, 36
n = 4
mean = 44 miles

differences
43 - 44 = -1
70 - 44 = 26
27 -44 = -17
36 - 44 = -8

differences^2
(-1)^2 = 1
(26)^2 = 676
(-17)^2 = 289
(-8)^2 = 64

The average of the differences^2:
(1+676+289+64)/4 = 257.5

The variance is:
257.5 miles^2

Coefficient of variation

The coefficient of variation (c.v.) is a measure of dispersion that allows comparison between groups that are measured in different units. It is calculated as the ratio of the standard deviation (SD) to the mean and is always expressed as a percent.

c.v. = (SD/mean)*100%

The c.v. can only be used when the data collected are ratio variables.

Standard Deviation

The standard deviation is a measure of dispersement, or, how spread out the data are. Each value in the data lies a distance from the sample mean (x minus x-bar). These distances are averaged in order to give a general sense of how the values tend to vary.

The sample mean is the center of the data, where there are an equal amount of negatives as positives. So, the sum of the differences will equal zero. Therefore, the differences must be squared before they are averaged. Squaring cancels the negatives. This average of the squared differences is called variance.

Variance is then square-rooted. This result is the standard deviation.

Pages