Math 132B

Class 19

Experimenting with errors

We collected 1000 samples from the height variable in the yrbss data set. For each of them, we tested the null hypothesis \(H_0: \mu = \mu_0\) with two-sided alternative:

do(1000) * {
    yrbss |>
        select(height) |>
        na.omit() |>
        sample_n(n) |>
        summarize(
            xbar = mean(height),
            s = sd(height),
            SE = s / sqrt(n)
        ) |>
        mutate(
            t_stat = (xbar - mu0) / SE,
            pval = 2 * pt(-abs(t_stat), df = n - 1),
            reject = pval < alpha
        )
} -> test_results

We did the tests with sample size 9 and 5% significance level:

n <- 9
alpha <- 0.05

Case 1: \(H_0\) is true

First we tested with true null hypothesis:

mu0 <- mean(~height, data = yrbss, na.rm = TRUE)

In other words, \(H_0: \mu = 1.691241\).

We got results similar to this:

tally(~reject, data = test_results)

reject
 TRUE FALSE 
   52   948

Since \(H_0\) was true, each rejection is a case of Type I error.

We should see a number close to 5% of 1000.

Case 2: \(H_0\) is false (\(H_0: \mu = 1.8\))

mu0 <- 1.8

We got results similar to this:

tally(~reject, data = test_results)

reject
 TRUE FALSE 
  746   254

We got a Type II error for 254 samples out of 1000, which is 25.4%.

When we increased the sample size to 25:

n <- 25

we got results similar to this:

tally(~reject, data = test_results)

reject
 TRUE FALSE 
  998     2

This time we got a Type II error for 2 samples out of 1000, which is 0.2%.

“Almost True” \(H_0: \mu = 1.7\)

Note that \(n\) is still 25.

mu0 <- 1.7

tally(~reject, data = test_results)

reject
 TRUE FALSE 
   90   910

With \(n = 100\) we get:

tally(~reject, data = test_results)

reject
 TRUE FALSE 
  128   872

With \(n = 900\) we get:

tally(~reject, data = test_results)

reject
 TRUE FALSE 
  730   270

Summary of the Experiment

If \(H_0\) is really true, and samples are simple random, the probability of type I error is \(\alpha\).
When \(H_0\) is false, the probability of Type II error decreases with increasing sample size.

It also seems to depend on “how false” the null hypothesis is.
If \(H_0\) is “almost true”, so that it can be considered true for practical purposes, the probability of type I error increases with sample size.
With very large samples, the test becomes ridiculously sensitive, leading to rejection of perfectly reasonable models.
If you need the precision that comes with large samples, you should not be doing hypothesis tests. It is better to use an interval estimate instead.

Probability of Type I Error

is always (possibly a lot) larger than \(\alpha\):

\(H_0\) is never really true!
- Do not use very large samples!
Sample are almost never simple random!
“Researcher degrees of freedom”
- Problem especially with small samples
“Publication bias”

Another Example

Researchers studying the number of electric fish species living in various parts of the Amazon basin were interested in whether the presence of tributaries affected the local number of electric fish species in the main rivers (Fernandes et al. 2004). They counted the number of electric fish species above and below the entrance point of a major tributary at 12 different river locations. Here’s what they found:

Tributary	Upstream number of species	Downstream number of species
Içá	14	19
Jutaí	11	18
Japurá	8	8
Coari	5	7
Purus	10	16
Manacapuru	5	6
Negro	23	24
Madeira	29	30
Trombetas	19	16
Tapajós	16	20
Xingu	25	21
Tocantins	10	12

Another Example (cont.)

They wanted to know if the presence of a tributary affects the number of species.

In other words, they wanted to know if there is a significant difference between the number of species found downstream of tributaries and the number of species found upstream of tributaries.

Matched Pairs Design

This is called a “matched pairs design”
This is also a case of a so-called “within-subjects” or “within-groups” design.
Sometimes this is referred to as “repeated measures”, or “multiple measures” design.
Each subject is measured multiple times (for example, before treatment and after treatment).
The study is looking at the differences between the individual measurements for each subject.

The Data

The data was given as a table with three columns:

Tributary, Upstream number of species, Downstream number of species
This is not a tidy data set! (Why?)
One of the first things we will learn is how to transform data sets like this.