Compare Two Proportions
In this section, we are looking at how we can compare two sets of categorical data. Let's take the color preferences from the class data as an example. Our daily experience tells us that men and women have different color preferences. For example, Men's section in the department stores features many more blue outfits than the women's section. Since the question "what is your favorite color" was included in the survey, we can ask the following question: "do more men choose blue as their favorite color than women?"
Keep in mind that the color preferences are categorical data: people can only choose from a set of labels, and blue is one of them. So the word "more" here doesn't translate to comparison of means, since there is no way to compare the "average" color between the two populations. Instead, the parameters to be compared are the proportions of people who choose blue as their favorite color. Let's take a look at what the data say:
|
Sample size |
How many choose blue |
Proportion |
|
|
Male |
40 |
16 |
0.400 |
|
Female |
54 |
11 |
0.204 |
The sample proportions seem quite far apart. But is such
difference significant? Similar to the discussion of comparing two means (which
reduces the two samples to a t statistic), we need a way to reduce the two
proportions into another test statistic. As you probably have guessed, here we
using the standard normal ,
just as what we did for testing one proportion:
The new notation is the overall proportion of people who choose
blue (27/94), and
is the proportion of those who do not like
blue. Computing this test statistic looks quite overwhelming, and I don't
expect to you do it manually all the time either. Where does it come from?
The story goes back to the reason why used as the test statistic for proportion. As we
discussed in previous notes, the reason why
was chosen was for its computation convenience
-- since if
is a binomial random variable that follows
,
the proportion
looks sufficiently similar to a normal
distribution with
.
So using
as the test statistic makes it easier to find
the P-value.
But this was not the only way to find the P-value for one proportion. If you use other statistics software, chances are they also offer another test called "binomial test" that bases its P-value directly on the binomial distribution. With the computers going on the cheap, even your phone can calculate the binomial probability with ease.
However, when we move to two proportions, using the normal approximation really makes a difference: if two random variables each follow a normal distribution, probability theory tells us that their difference will also be normal:
·
(You can verify this by generating two normal random numbers, subtract them, and plot the histogram. But an actual proof of this fact will require some calculus.)
The usefulness of this property for testing the two
proportions can be seen from the sampling distribution used to construct the
hypothesis test: is the difference in sample proportions, and
is the difference in population proportions. The
Central Limit Theorem has this to say about our sampling distribution:
So the test statistic for two proportions is again based on another normal distribution.
After you've seen the theory, let's take a look at our earlier example of color preferences.
If we use the subscript 1 for female, 2 for male. Then the question “do more man choose blue as favorite color” should be phrased as left-tailed test (because we’ve used 1 for the female population):
·
Other alternative notations include using ,
or
for the null hypothesis.
On the other hand, our data can be summarized as follows:
Combining all of these, we have the value of the test statistic:
And the P-value turns out to be ,
since we’ve used a left-tailed test.
Based on α = 0.05, the conclusion is as expected: there is significant evidence to show that the proportion of blue lover is lower among women than among men.
The interpretation of the P-value in our example follows the same template of “if
H0 is true, the probability that your data is more extreme than the current
situation.” But we’ll also need to consider the sampling distribution when we
are interpreting the P-value:
·
If the
proportions of people whose choose blue as their favorite color are the same
for men and women, then the chance of getting samples of the same sizes that
show a difference in sample proportions smaller than is
2%.
The P-value and its connection to the sampling distribution is illustrated in the graph below.

Hopefully you have seen how to approach any hypothesis test that you have not seen previously: if you know how to formulate the hypotheses, and which test statistic to use, then you have enough information to complete the rest of the hypothesis test, including the P-value. This table summarizes the scenarios we have seen so far (assuming we are conducting a two-tailed test)
|
Testing two proportions |
Testing two means |
Testing the Difference between Pairs |
|
|
|
|
|
|
|
Test Statistic |
|
Student’s |
Student’s |