Test for Homogeneity
Summary: This module describes how the chi-square distribution can be
used to test for homogeneity.
The Goodness of Fit test can be used to decide whether a population fits a
given distribution, but the Goodness of Fit test will not suffice to compare whether two
populations follow the same unknown distribution. A different test, called
the Test for Homogeneity, can be used to make a conclusion about whether two
populations have the same distribution. To calculate the test statistic for
a test for homogeneity, follow the same procedure as with the c2
test
for independence.
Here is a summary of the Test for Homogeneity:
Hypotheses
H0: The distributions of the two
populations are the same.
Ha: The distributions of the two populations are not the
same.
Test Statistic
Uses a c2
statistic. It is computed in the same way as the test for independence.
Requirements
All values in the table must be greater than or equal to
5.
Common Uses
Comparing two populations. For example: men
vs women, before vs. after, east vs. west.
The variable is categorical with more than two
possible response values.
EXAMPLE 1
Do male and female college students have the same distribution of
living conditions? Use a level of significance of 0.05.
Suppose that 250 randomly selected male college students and 300
randomly selected female college students were asked about their living
conditions: Dorm, Apartment, With Parents, Other. The
results are shown in the table below:
|
Dorm |
Apartment |
With Parents |
Other |
Male |
72 |
84 |
49 |
45 |
Female |
91 |
86 |
88 |
35 |
Solution
The null and alternative hypotheses are:
H0: |
The distribution of living
conditions for male college students is the same as the distribution of
living conditions for female college students. |
Ha: |
The distribution of living
conditions for male college students is the not the same as the
distribution of living conditions for female college students. |
To compute the test statistic, follow the same
process as with the test for independence. Here there are 2 rows
and 4 columns. Note that the degrees of freedom for this test is
df = the number of columns - 1 = 3
To the right is the readout from a TI 84+.
The c2 test statistic is about 10.13 and the p-value
is 0.0175. As with all hypothesis tests, reject the null
hypothesis if the p-value is less than the level of significance and
fail to reject the null hypothesis if the p-value is greater than the
level of significance. In this case,
p-value = 0.0175 < 0.05 = Level of Significance
Therefore, reject the null hypothesis and accept
the alternative hypothesis. You can conclude that the distributions
of living conditions for male and female college students are not the
same.
Notice that the conclusion is only that the
distributions are not the same. One cannot use the test for
homogeneity to make any conclusions about how they differ.
|
EXAMPLE 2
Both before and after a recent earthquake surveys were conducted
asking voters which of the three candidates they planned on voting for
in the upcoming city council election. Has there been a change
since the earthquake? Use a level of significance of 0.05.
The table below shows the results of the survey.
|
Perez |
Chung |
Stevens |
Before |
167 |
128 |
135 |
After |
214 |
197 |
225 |
Solution
The null and alternative hypotheses are:
H0: |
The distribution of voter
preferences was the same before and after the earthquake. |
Ha: |
The
distribution of voter preferences was not the same before and
after the earthquake. |
This table has 2 rows
and 3 columns. The degrees of freedom for this test is
df = the number of columns - 1 = 2
To the right is the readout from a TI 84+.
The c2 test statistic is
about 3.26 and the p-value is 0.196. The inequality is
p-value = 0.196 > 0.05 = Level of Significance
Therefore, fail to reject the null hypothesis.
There is insufficient evidence to make a conclusion about whether the
distribution of voter preferences differs before and after the
earthquake.
|
|
Summary of a c2-Tests
You have seen the a c2
test statistic used in three different circumstances. Below is a summary that
will help you decide which c2
test is the appropriate one to use.
- Goodness of Fit: Use the Goodness of Fit Test
when you want to decide whether a population with unknown distribution
"fits" a known distribution. In this case there will be a single
qualitative survey question or a single outcome of an experiment from a
single population. Goodness of fit is typically used to see if the
population is uniform (all outcomes occur with equal frequency), the
population is normal, or the population is the same as another population
with known distribution. The null and alternative hypotheses are:
- H0: The population fits the given distribution.
- Ha: The population does not fit the given
distribution.
- Independence: Use the Test for Independence when
you want to decide whether two variables are independent or dependent.
In this case there will be two qualitative survey questions or experiments
and a contingency table will be constructed. The goal is to see if the
two variables are unrelated (independent) or related (dependent). The
null and alternative hypotheses are:
- H0: The two variables are independent.
- Ha: The two variables are dependent.
- Homogeneity: Use the Test for Homogeneity when
you want to decide if two populations with unknown distribution have the
same distribution as each other. In this case there will be a single
qualitative survey question or experiment given to two different
populations. The null and alternative hypotheses are:
- H0: The two populations follow the same
distribution.
- Ha: The two populations have different
distributions.
|