Math 132B

Class 21

Comparing two population means

Two-sample data can be paired or unpaired (independent).

  • Paired measurements for each ‘participant’ or study unit

    • each observation can be logically matched to one other observation in the data

    • e.g., scores on a standardized test before taking a prep course versus scores after the prep course

  • Two independent sets of measurements

    • observations cannot be matched on a one-to-one basis

    • e.g., scores on a standardized test of students who did take a prep course versus scores of students who did not

The nature of the data dictate which two-sample testing procedure is appropriate: the two-sample test for paired data, or the two-sample test for independent group data.

Other terminology

  • Paired data:

    • Matched pairs design
    • Withing subject design
    • Withing group design
    • Multiple measurements design
  • Independent groups

    • Between subjects design
    • Between groups design

Example: number of species of electric fish

Does the presence of a tributary affect the number of species of electric fish in the Amazon basin?

  • Researchers counted number of species of electric fish upstream and downstream from 12 tributaries of the Amazon.

  • They were asking if there is a difference between the mean number of species between the populations upstream and downstream from a tributary.

Idea behind the paired \(t\)-test

  • The numbers are paired by the tributary.

  • For each tributary, we can calculate the difference between the number upstream and the number downstream.

  • Then we do a regular single population t-test or estimate for the differences.

  • \(H_0: \mu_d = 0\)

  • The formulas are the same as for the one-population test and estimates, except we use \(d\) instead of \(x\).

FAMuSS data set

Functional SNPs Associated with Muscle Size and Strength

“One goal of the study is to examine the association of demographic, physiological and genetic characteristics with muscle strength. Strength was measured in both dominant and non-dominant arms before and after resistance training. The particular gene of interest here is ACTN3, the sports gene.”

9 variables, we are interested in two:

  • ndrm.ch: the percent change in strength in a participant’s non-dominant arm, from before training and after.
  • sex: A factor with levels Female and Male.

FAMuSS data set: comparing ndrm.ch by sex

Does change in non-dominant arm strength after resistance training differ between men and women?

Rows: 595
Columns: 9
$ ndrm.ch     <dbl> 40.0, 25.0, 40.0, 125.0, 40.0, 75.0, 100.0, 57.1…
$ drm.ch      <dbl> 40.0, 0.0, 0.0, 0.0, 20.0, 0.0, 0.0, -14.3, 0.0,…
$ sex         <fct> Female, Male, Female, Female, Female, Female, Fe…
$ age         <int> 27, 36, 24, 40, 32, 24, 30, 28, 27, 30, 20, 23, …
$ race        <fct> Caucasian, Caucasian, Caucasian, Caucasian, Cauc…
$ height      <dbl> 65.0, 71.7, 65.0, 68.0, 61.0, 62.2, 65.0, 68.0, …
$ weight      <dbl> 199, 189, 134, 171, 118, 120, 134, 162, 189, 120…
$ actn3.r577x <fct> CC, CT, CT, CT, CC, CT, TT, CT, CC, CT, CT, CT, …
$ bmi         <dbl> 33.112, 25.845, 22.296, 25.998, 22.293, 21.805, …

FAMuSS: comparing ndrm.ch by sex…

FAMuSS: comparing ndrm.ch by sex…

favstats(ndrm.ch ~ sex, data = famuss)
     sex min   Q1 median   Q3 max     mean       sd   n missing
1 Female   0 37.5   57.1 83.3 250 62.92720 36.51909 353       0
2   Male   0 25.0   36.4 50.0 150 39.23512 20.60331 242       0

Difference of means:

diffmean(ndrm.ch ~ sex, data = famuss)
 diffmean 
-23.69207 

Alternatively:

diff(mean(ndrm.ch ~ sex, data = famuss))
     Male 
-23.69207 

The independent two-group \(t\)-test

The null and alternative hypotheses are

  • \(H_0: \mu_F = \mu_M\), the population mean change in arm strength for women is the same as the population mean change in arm strength for men

    • Equivalently, \(H_0: \Delta = \mu_F - \mu_M = 0\)
  • \(H_A: \mu_F \neq \mu_M\), the population mean change in arm strength for women is different from the population mean change in arm strength for men

In general, the hypotheses are written in terms of \(\mu_1\) and \(\mu_2\).

  • The parameter of interest is \(\mu_1 - \mu_2\).

  • The point estimate is \(\overline{x}_1 - \overline{x}_2\).

The independent two-group \(t\)-test…

The \(t\)-statistic is:

\[t =\dfrac{ (\overline{x}_{1} - \overline{x}_{2})- (\mu_1 - \mu_2)} {\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}} \]

The \(p\)-value is calculated as usual, but the degrees of freedom for the distribution are different from for the paired data setting…

Degrees of freedom for the independent two-group \(t\)-test

When doing the test by hand, use the following approximation: \[df = \text{min}(n_1 - 1, n_2 - 1) \]

R uses a better approximation, known as the Satterthwaite approximation:

\[df = \dfrac{\left[(s_1^2/n_1) + (s_2^2/n_2)\right]^2}{\left[(s_1^2/n_1)^2/(n_1 - 1) + (s_2^2/n_2)^2/(n_2 - 1)\right]}\]

FAMuSS Example

\(\overline{x}_1 - \overline{x}_2\):

diffmean(ndrm.ch ~ sex, data = famuss)
 diffmean 
-23.69207 

Standard deviations:

sd(ndrm.ch ~ sex, data = famuss)
  Female     Male 
36.51909 20.60331 

Sample sizes:

sum(!is.na(ndrm.ch) ~ sex, data = famuss)
Female   Male 
   353    242 

\(\displaystyle SE = \sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}\)

\(\displaystyle t =\dfrac{ (\overline{x}_{1} - \overline{x}_{2})- (\mu_1 - \mu_2)}{SE}\)

\(SE = 2.352051\)

\(t = -10.0729411\)

Two-tailed test p-value:

2 * pt(-10.07, df = 241)
[1] 3.904446e-20

We have sufficient evidence to reject the null hypothesis that there is no difference in the change of the non-dominant arm strength between males and females, at 5% significance level.

Confidence intervals for independent two-group data

The 95% confidence interval for the difference in population means has the form \[( \overline{x}_{1} - \overline{x}_{2}) \pm \left( t^{\star} \times \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}} \right), \]

where \(t^{\star}\) is the point on a \(t\) distribution that has area 0.025 to the right, with the same \(df\) as used for calculating the \(p\)-value of the associated test.

FAMuSS Example

\(\overline{x}_1 - \overline{x}_2\):

diffmean(ndrm.ch ~ sex, data = famuss)
 diffmean 
-23.69207 

Standard deviations:

sd(ndrm.ch ~ sex, data = famuss)
  Female     Male 
36.51909 20.60331 

Sample sizes:

sum(!is.na(ndrm.ch) ~ sex, data = famuss)
Female   Male 
   353    242 

\(\displaystyle SE = \sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}} = 2.352051\)

Critical \(t\)-score:

qt(0.975, df = 241)
[1] 1.969856

Margin of error:

\(\displaystyle m = 2.35\cdot 1.9698562 = 4.6291621\)

Interval: \(\left( -23.69207 - 4.6291621, -23.69207 + 4.6291621 \right)\)

\({}=\left( -28.3212321, -19.0629079 \right)\)

We are 95% confident that the interval \(\left( -28.321, -19.063 \right)\) contains the population difference between the change of the non-dominant arm strength in males and females.

Letting R do the work

t.test(ndrm.ch ~ sex, data = famuss, mu = 0,
       alternative = "two.sided")

    Welch Two Sample t-test

data:  ndrm.ch by sex
t = 10.073, df = 574.01, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 19.07240 28.31175
sample estimates:
mean in group Female   mean in group Male 
            62.92720             39.23512 

Hand calculation produced (using df = 241):

\(t = -10.0729411\)

\(P\)-value: \(3.9044459\times 10^{-20}\)

Interval:

\(\left( -28.321, -19.063 \right)\)

Example (Egg volume)

In a study examining 131 collared flycatcher eggs, researchers measured various characteristics in order to study their relationship to egg size (assayed as egg volume, in \(\text{mm}^3\)). These characteristics included nestling sex and survival. A single pair of collared flycatchers generally lays around 6 eggs per breeding season; laying order of the eggs was also recorded.

Is there evidence at the \(\alpha = 0.10\) significance level to suggest that egg size differs between male and female chicks?

  • For male chicks, \(\overline{x} = 1619.95\), \(s = 127.54\), and \(n = 80\).
  • For female chicks, \(\overline{x} = 1584.20\), \(s = 102.51\), and \(n = 48\).

Sex was only recorded for eggs that hatched.

T-Table

Example (Egg volume)

In a study examining 131 collared flycatcher eggs, researchers measured various characteristics in order to study their relationship to egg size (assayed as egg volume, in \(\text{mm}^3\)). These characteristics included nestling sex and survival. A single pair of collared flycatchers generally lays around 6 eggs per breeding season; laying order of the eggs was also recorded.

Construct a 95% confidence interval for the difference in egg size between chicks that successfully fledged (developed capacity to fly) and chicks that died in the nest. From the interval, is there evidence of a size difference in eggs between these two groups?

  • For chicks that fledged, \(\overline{x} = 1605.87\), \(s = 126.32\), and \(n = 89\).
  • For chicks that died in the nest, \(\overline{x} = 1606.91\), \(s = 103.46\), \(n = 42\).

Comparison of the two designs

  • Within subjects design is more powerful (it is easier to reject a false null hypothesis, confidence intervals have smaller margins of error).

  • It is not always possible, for example when you are comparing two treatments (poisoning the well).