prop.test(x = 21, n = 52, conf.level = 0.95)$conf.int[1] 0.2731269 0.5487141
attr(,"conf.level")
[1] 0.95
Class 27
Two or more levels
Proportion of one of the levels (success)
Proportions of all the levels (distribution)
Comparison of proportions or distributions between groups (two categorical variables)
Advanced melanoma is an aggressive form of skin cancer that until recently was almost uniformly fatal.
Research is being conducted on therapies that might trigger an immune response to the cancer and cause the melanoma to stop progressing or disappear entirely.
In a study where 52 patients were treated concurrently with two new therapies, nivolumab and ipilimumab, 21 had an immune response. (Wolchok, et. al. NEJM (2013) 369(2): 122-33.)
Questions that can be addressed with inferenceβ¦
What is the estimated population probability of immune response following concurrent therapy with nivolumab and ipilimumab?
What is the 95% confidence interval for the estimated population probability of immune response following concurrent therapy with nivolumab and ipilimumab?
In previous studies, the proportion of patients responding to one of these agents was 30% or less. Do these results suggest that the probability of response to concurrent therapy is better than 0.30?
The melanoma data are binomial data, with success defined as experiencing an immune response.
Inference is made about the population parameter
The estimate of
The test statistic can be
We can also use
We could use binomial distribution as sampling distribution
Under certain assumptions, binomial distribution can be approximated by normal distribution.
Number of successes:
Proportion of successes:
The sampling distribution of
The sample observations are independent, and
At least 10 successes and 10 failures are expected in the sample:
Under these conditions,
When computing an interval estimate,
In the context of calculating CIs, substitute
An approximate two-sided confidence interval for
Example:
prop.test(x = 21, n = 52, conf.level = 0.95)$conf.int[1] 0.2731269 0.5487141
attr(,"conf.level")
[1] 0.95
In the context of hypothesis testing, substitute
The test statistic
Example:
prop.test(x = 21, n = 52, p = 0.30, alternative = "greater")
1-sample proportions test with continuity correction
data: 21 out of 52
X-squared = 2.1987, df = 1, p-value = 0.06906
alternative hypothesis: true p is greater than 0.3
95 percent confidence interval:
0.2906582 1.0000000
sample estimates:
p
0.4038462
Using the binomial distribution:
p-value is
pbinom(20, size = 52, p = 0.30, lower.tail = FALSE)[1] 0.07167176
The exact binomial test:
binom.test(x = 21, n = 52, p = 0.30, alternative = "greater")
data: 21 out of 52
number of successes = 21, number of trials = 52, p-value = 0.07167
alternative hypothesis: true probability of success is greater than 0.3
95 percent confidence interval:
0.2889045 1.0000000
sample estimates:
probability of success
0.4038462
do(10000) * rflip(52, prob = 0.30) -> sims
prop(~(heads >= 21), data = sims)prop_TRUE
0.068
Difference of two independent normally distributed variables is normally distributed.
The mean is the difference of the two means.
The variance is the sum of the two variances.
The standard deviation is the square root of the sum of the two variances.
The normal model can be applied to
The two samples are independent, the observations in each sample are independent, and
At least 10 successes and 10 failures are expected in each sample.
The standard error of the difference in sample proportions is
In hypothesis testing, the following estimate of
Use this for both
Researchers (Saraux at al., 2011) wanted to know whether metal bands used for tagging penguins are harmful. They selected a random sample of 100 penguins, tagged them with RFID chips, and tagged 50 of them with metal bands. After about 4 years, they checked how many penguins in each group survived.
group
survived band control
TRUE 16 31
FALSE 34 19
The same with proportions:
group
survived band control
TRUE 0.32 0.62
FALSE 0.68 0.38
What does it mean?