The Central Limit Theorem

A Review of Terminology

We begin our journey into inferential statistics.  Most of the time the population mean and population standard deviation are impossible or too expensive to determine exactly.  Two of the major tasks of a statistician is to get an approximation to the mean and analyze how accurate the approximation is.  The most common way of accomplishing this task is by using sampling techniques.  Out of the entire population the researcher obtains a (hopefully random) sample from the population and uses the sample to make inferences about the population.  From the sample the statistician computes several numbers such as the sample size, the sample mean, and the sample standard deviation.  The numbers that are computed from the sample are called statistics

How many cups of coffee do you drink each week?

If we asked this question to two different five person groups, we will probably get two different sample means and two different sample standard deviations.  Choosing different samples from the same population will produce different statistics.

The distribution of all possible samples is called the sampling distribution.

The Five Dice Experiment:

Consider the distribution of rolling a die.  It is uniform (flat) between 1 and 6.  We will roll five dice we can compute the pdf of the mean. We will see that the distribution becomes more like a normal distribution.  The experiment can be modeled at  Best Site or Another site.

The Central Limit Theorem

Let x denote the mean of a random sample of size n from a population having mean m and standard deviation s.  Let

mx  =  mean value of  x  and

s x = the standard deviation of x

then

1. mx  =  m

2. 3. When the population distribution is normal so is the distribution of  x  for any n.

4. For large n, the distribution of  x  is approximately normal regardless of the population distribution Rule of thumb:  n  >  30  is large

Example:

Suppose that we play a slot machine such you can either double your bet or lose your bet.  If there is a 45% chance of winning then the expected value for a dollar wager is

1(.45) + (-1)(.55)  =  -.1

We can compute the standard deviation:

 x p(x) (x - m)2 p(x)(x - m)2 1 .45 1.21 .545 -1 .55 .81 .446 Total .991

So the standard deviation is If we throw 100 silver dollars into the slot machine then we expect to average a loss of ten cents with a standard deviation of Notice that the standard deviation is very small.  This is why the casinos are assured to make money.

Now let us find the probability that the gambler does not lose any money, that is the mean is greater than or equal to 0. We first compute the z-score.  We have

0 - (-.1)
z  =                          =   1.01
.0995

Now we go to the table to find the associated probability.  We get .8438.  Since we want the area to the right, we subtract from 1 to get

P(z > 1.01)  =  1 - P(z < 1.01)  =  1 - .8438  =  .1562

There is about a 16% chance that the gambler will not lose.

Sampling Distributions for Proportions

The last example was a special case of proportions, that is Boolean data.  For now on, we can use the following theorem.

 The Central Limit Theorem for Proportions Let p be the probability of success, q be the probability of failure.  The sampling distribution for samples of size n is approximately normal with mean Example

The new Endeavor SUV has been recalled because 5% of the cars experience brake failure.  The Tahoe dealership has sold 200 of these cars.  What is the probability that fewer than 4% of the cars from Tahoe experience brake failure?

Solution

We have

p  =  .05        q  =  .95        n  = 200

We have

mp  =  p  =  .05       sp  = .0154

Next we want to find

P(x < 8)

Using the continuity correction, we find instead

P(x < 7.5)

This is equivalent to

P(p < 7.5/200)  =  P(p < .0375)

We find the z-score

.0375  -  .05
z  =                              =  -.81
.0154

The table gives a probability of .2090.  We can conclude that there is about a 21% chance that fewer than 4% of the cars will experience brake failure.

Control Charts for Proportions

A while back we discussed how to construct a control chart.  Click here for this discussion.  For proportions, we can use the same tool remembering that the Central Limit Theorem tells us how to find the mean and standard deviation.

Example

Heavenly Ski resort conducted a study of falls on its advanced run over twelve consecutive ten minute periods.  At each ten minute interval there were 40 boarders on the run.  The data is shown below:

 Time 1 2 3 4 5 6 7 8 9 10 11 12 r 14 18 11 16 19 22 6 12 13 16 9 17 r/40 0.35 0.45 0.275 0.4 0.475 0.55 0.15 0.3 0.325 0.4 0.225 0.425

Make a P-Chart and list any out of control signals by type (I, II, III).

Solution

First we find p by dividing the total number of falls by the total number of skiers:

173
p  =                    =  .36
12(40)

Now we compute the mean Now we find two and three standard deviations above and below the mean are

.36 - (2)(.08)  =  .20        .36 - (3)(.08)  =  .04

.36 + (2)(.08)  =  .52        .36 + (3)(.08)  =  .68

Now we can use this data as before to construct a control chart and determine any out of control signals. Notice that no nine consecutive points lie on one side of the blue line, no two of three points lie above 0.52 or below 0.20, and no points lie below 0.04 or above 0.68.  Hence this data is in control.