- Home
- Easy Guides
- R software
- R Basic Statistics
- Comparing Proportions in R
- Two-Proportions Z-Test in R
- What is two-proportions z-test?
- Research questions and statistical hypotheses
- Formula of the test statistic
- Case of large sample sizes
- Case of small sample sizes
- Compute two-proportions z-test in R
- R functions: prop.test[]
- Compute two-proportions z-test
- Interpretation of the result
- Access to the values returned by prop.test[] function
- See also
- Infos
What is two-proportions z-test?
The two-proportions z-test is used to compare two observed proportions. This article describes the basics of two-proportions *z-test and provides pratical examples using R sfoftware**.
For example, we have two groups of individuals:
- Group A with lung cancer: n = 500
- Group B, healthy individuals: n = 500
The number of smokers in each group is as follow:
- Group A with lung cancer: n = 500, 490 smokers, \[p_A = 490/500 = 98%\]
- Group B, healthy individuals: n = 500, 400 smokers, \[p_B = 400/500 = 80%\]
In this setting:
- The overall proportion of smokers is \[p = frac{[490 + 400]}{500 + 500} = 89%\]
- The overall proportion of non-smokers is \[q = 1-p = 11%\]
We want to know, whether the proportions of smokers are the same in the two groups of individuals?
Research questions and statistical hypotheses
Typical research questions are:
- whether the observed proportion of smokers in group A [\[p_A\]] is equal to the observed proportion of smokers in group [\[p_B\]]?
- whether the observed proportion of smokers in group A [\[p_A\]] is less than the observed proportion of smokers in group [\[p_B\]]?
- whether the observed proportion of smokers in group A [\[p_A\]] is greater than the observed proportion of smokers in group [\[p_B\]]?
In statistics, we can define the corresponding null hypothesis [\[H_0\]] as follow:
- \[H_0: p_A = p_B\]
- \[H_0: p_A \leq p_B\]
- \[H_0: p_A \geq p_B\]
The corresponding alternative hypotheses [\[H_a\]] are as follow:
- \[H_a: p_A \ne p_B\] [different]
- \[H_a: p_A > p_B\] [greater]
- \[H_a: p_A < p_B\] [less]
Note that:
- Hypotheses 1] are called two-tailed tests
- Hypotheses 2] and 3] are called one-tailed tests
Formula of the test statistic
Case of large sample sizes
The test statistic [also known as z-test] can be calculated as follow:
\[ z = \frac{p_A-p_B}{\sqrt{pq/n_A+pq/n_B}} \]
where,
- \[p_A\] is the proportion observed in group A with size \[n_A\]
- \[p_B\] is the proportion observed in group B with size \[n_B\]
- \[p\] and \[q\] are the overall proportions
- if \[|z| < 1.96\], then the difference is not significant at 5%
- if \[|z| \geq 1.96\], then the difference is significant at 5%
- The significance level [p-value] corresponding to the z-statistic can be read in the z-table. We’ll see how to compute it in R.
Note that, the formula of z-statistic is valid only when sample size [\[n\]] is large enough. \[n_Ap\], \[n_Aq\], \[n_Bp\] and \[n_Bq\] should be \[\geq\] 5.
Case of small sample sizes
The Fisher Exact probability test is an excellent non-parametric technique for comparing proportions, when the two independent samples are small in size.
Compute two-proportions z-test in R
R functions: prop.test[]
The R functions prop.test[] can be used as follow:
prop.test[x, n, p = NULL, alternative = "two.sided", correct = TRUE]- x: a vector of counts of successes
- n: a vector of count trials
- alternative: a character string specifying the alternative hypothesis
- correct: a logical indicating whether Yates’ continuity correction should be applied where possible
Note that, by default, the function prop.test[] used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument correct = FALSE in prop.test[] function. The default value is TRUE. [This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.]
Compute two-proportions z-test
We want to know, whether the proportions of smokers are the same in the two groups of individuals?
res