Math 132B

Class 20

Experimenting with errors

We collected 1000 samples from the height variable in the yrbss data set. For each of them, we tested the null hypothesis \(H_0: \mu = \mu_0\) with two-sided alternative:

do(1000) * {
    yrbss |>
        select(height) |>
        na.omit() |>
        sample_n(n) |>
        summarize(
            xbar = mean(height),
            s = sd(height),
            SE = s / sqrt(n)
        ) |>
        mutate(
            t_stat = (xbar - mu0) / SE,
            pval = 2 * pt(-abs(t_stat), df = n - 1),
            reject = pval < alpha
        )
} -> test_results

We did the tests with sample size 9 and 5% significance level:

n <- 9
alpha <- 0.05

Case 1: \(H_0\) is true

First we tested with true null hypothesis:

mu0 <- mean(~height, data = yrbss, na.rm = TRUE)

In other words, \(H_0: \mu = 1.691241\).

We got results similar to this:

tally(~reject, data = test_results)
reject
 TRUE FALSE 
   52   948 

Since \(H_0\) was true, each rejection is a case of Type I error.

We should see a number close to 5% of 1000.

Case 2: \(H_0\) is false (\(H_0: \mu = 1.8\))

mu0 <- 1.8

We got results similar to this:

tally(~reject, data = test_results)
reject
 TRUE FALSE 
  783   217 

We got a Type II error for 217 samples out of 1000, which is 21.7%.

When we increased the sample size to 25:

n <- 25

we got results similar to this:

tally(~reject, data = test_results)
reject
 TRUE FALSE 
  996     4 

This time we got a Type II error for 4 samples out of 1000, which is 0.4%.

“Almost True” \(H_0: \mu = 1.7\)

Note that \(n\) is still 25.

mu0 <- 1.7
tally(~reject, data = test_results)
reject
 TRUE FALSE 
   74   926 

With \(n = 100\) we get:

tally(~reject, data = test_results)
reject
 TRUE FALSE 
  132   868 

With \(n = 900\) we get:

tally(~reject, data = test_results)
reject
 TRUE FALSE 
  732   268 

Summary of the Experiment

  • If \(H_0\) is really true, and samples are simple random, the probability of type I error is \(\alpha\).

  • When \(H_0\) is false, the probability of Type II error decreases with increasing sample size.

    It also seems to depend on “how false” the null hypothesis is.

  • If \(H_0\) is “almost true”, so that it can be considered true for practical purposes, the probability of type I error increases with sample size.

  • With very large samples, the test becomes ridiculously sensitive, leading to rejection of perfectly reasonable models.

  • If you need the precision that comes with large samples, you should not be doing hypothesis tests. It is better to use an interval estimate instead.

Probability of Type I Error

is always (possibly a lot) larger than \(\alpha\):

  • \(H_0\) is never really true!
    • Do not use very large samples!
  • Sample are almost never simple random!
  • “Researcher degrees of freedom”
    • Problem especially with small samples
  • “Publication bias”

Another Example

Researchers studying the number of electric fish species living in various parts of the Amazon basin were interested in whether the presence of tributaries affected the local number of electric fish species in the main rivers (Fernandes et al. 2004). They counted the number of electric fish species above and below the entrance point of a major tributary at 12 different river locations. Here’s what they found:

Tributary Upstream number of species Downstream number of species
Içá 14 19
Jutaí 11 18
Japurá 8 8
Coari 5 7
Purus 10 16
Manacapuru 5 6
Negro 23 24
Madeira 29 30
Trombetas 19 16
Tapajós 16 20
Xingu 25 21
Tocantins 10 12

Another Example (cont.)

They wanted to know if the presence of a tributary affects the number of species.

In other words, they wanted to know if there is a significant difference between the number of species found downstream of tributaries and the number of species found upstream of tributaries.

Matched Pairs Design

  • This is called a “matched pairs design”
  • This is also a case of a so-called “within-subjects” or “within-groups” design.
  • Sometimes this is referred to as “repeated measures”, or “multiple measures” design.
  • Each subject is measured multiple times (for example, before treatment and after treatment).
  • The study is looking at the differences between the individual measurements for each subject.

The Data

  • The data was given as a table with three columns:

    Tributary, Upstream number of species, Downstream number of species

  • This is not a tidy data set! (Why?)

  • One of the first things we will learn is how to transform data sets like this.