Compare Two Proportions

Example: Comparing Two Qualitative Data Sets

In this section, we are looking at how we can compare two sets of categorical data. Let's take the color preferences from the class data as an example. Our daily experience tells us that men and women have different color preferences. For example, Men's section in the department stores features many more blue outfits than the women's section. Since the question "what is your favorite color" was included in the survey, we can ask the following question: "do more men choose blue as their favorite color than women?"

Keep in mind that the color preferences are categorical data: people can only choose from a set of labels, and blue is one of them. So the word "more" here doesn't translate to comparison of means, since there is no way to compare the "average" color between the two populations. Instead, the parameters to be compared are the proportions of people who choose blue as their favorite color. Let's take a look at what the data say:

	Sample size	How many choose blue	Proportion
Male	40	16	0.400
Female	54	11	0.204

Sampling Distribution and Test Statistic

The sample proportions seem quite far apart. But is such difference significant? Similar to the discussion of comparing two means (which reduces the two samples to a t statistic), we need a way to reduce the two proportions into another test statistic. As you probably have guessed, here we using the standard normal , just as what we did for testing one proportion:

The new notation is the overall proportion of people who choose blue (27/94), and is the proportion of those who do not like blue. Computing this test statistic looks quite overwhelming, and I don't expect to you do it manually all the time either. Where does it come from?

The story goes back to the reason why used as the test statistic for proportion. As we discussed in previous notes, the reason why was chosen was for its computation convenience -- since if is a binomial random variable that follows , the proportion looks sufficiently similar to a normal distribution with . So using as the test statistic makes it easier to find the P-value.

But this was not the only way to find the P-value for one proportion. If you use other statistics software, chances are they also offer another test called "binomial test" that bases its P-value directly on the binomial distribution. With the computers going on the cheap, even your phone can calculate the binomial probability with ease.

However, when we move to two proportions, using the normal approximation really makes a difference: if two random variables each follow a normal distribution, probability theory tells us that their difference will also be normal:

(You can verify this by generating two normal random numbers, subtract them, and plot the histogram. But an actual proof of this fact will require some calculus.)

The usefulness of this property for testing the two proportions can be seen from the sampling distribution used to construct the hypothesis test: is the difference in sample proportions, and is the difference in population proportions. The Central Limit Theorem has this to say about our sampling distribution:

So the test statistic for two proportions is again based on another normal distribution.

Example: Testing Claims about Two Proportions

After you've seen the theory, let's take a look at our earlier example of color preferences.

If we use the subscript 1 for female, 2 for male. Then the question “do more man choose blue as favorite color” should be phrased as left-tailed test (because we’ve used 1 for the female population):

Other alternative notations include using , or for the null hypothesis.

On the other hand, our data can be summarized as follows:

Combining all of these, we have the value of the test statistic:

And the P-value turns out to be , since we’ve used a left-tailed test.

Based on α = 0.05, the conclusion is as expected: there is significant evidence to show that the proportion of blue lover is lower among women than among men.

Interpretation of P-value

The interpretation of the P-value in our example follows the same template of “if H0 is true, the probability that your data is more extreme than the current situation.” But we’ll also need to consider the sampling distribution when we are interpreting the P-value:

· If the proportions of people whose choose blue as their favorite color are the same for men and women, then the chance of getting samples of the same sizes that show a difference in sample proportions smaller than is 2%.

The P-value and its connection to the sampling distribution is illustrated in the graph below.

Summary: Comparing Two Populations

Hopefully you have seen how to approach any hypothesis test that you have not seen previously: if you know how to formulate the hypotheses, and which test statistic to use, then you have enough information to complete the rest of the hypothesis test, including the P-value. This table summarizes the scenarios we have seen so far (assuming we are conducting a two-tailed test)

	Testing two proportions	Testing two means	Testing the Difference between Pairs

Test Statistic		Student’s ()	Student’s (df = # of pairs - 1)