Math 132B

Class 14

Goal of Statistics

To tell stories about real world using data.

  • make predictions
  • figure out ways to improve things
  • come up with some understanding or explanation of how things work

Role of mathematical models

One way to do all that is by using mathematical models.

  • “All models are wrong …
  • … but some are useful.”
  • How do we construct useful models,
  • or how do we verify whether a model is useful,
  • in a world that is full of uncertainty and noise?

Statistics cannot create cetrainty
out of thin air!

It can quantify the uncertainty and help you understand it.

What is statistical inference?

Goal: Create a mathematical model for some random variable(s),

or “evaluate” an existing model for some random variables(s),

using a sample of its (their) values.


Model for a random variable: A distribution with some parameters.

Population

  • The random variable is tied to some “population”:

    • All patients in some hospital that have certain disease
    • All patients in the world that currently have certain disease
    • All potential patients in the world that currently have certain disease, or had the disease in the past, or will have the disease in the future
    • All plants of certain species in a given forest.
    • All plants of certain species, in the past, now, and in the future
    • All adults currently living in the US
    • All current, past and future adults in the US
  • We then talk about the population distribution, the population parameter, and a model of the population.

Unreasonably optimistic goal

Figure out exactly what model to use.

That means complete description of the distribution, including the exact values of all the parameters.

We only have a sample of the values, that’s not going to be enough.

Realistic goals

  • Figure out something about one of the parameters, or

  • Figure out something about the way the variable is distributed.

Perhaps we already have some idea about the type of the distribution, can we learn something about its parameter(s)?

What can we learn from a sample?

  • Suppose I already decided that a normal distribution would be an appropriate model for my population.

  • I want to know what I should use as mean and standard deviation.

  • I collect a sample and calculate its mean and standard deviation.

  • What does it tell me?

  • It may be useful to know something about relationship of sample mean and population mean…

Let’s Experiment

  • The YRBSS data set contains 13,583 observations from surveys conducted from 1991 to 2013.

  • It has 13 variables, we will look at height.

  • The data set is available as a part of the openintro package.

Central Limit Theorem (part 1)

When sampling from a population that is normally distributed with mean \(\mu\) and standard deviation \(\sigma\):

  • The sample means are also normally distributed…
  • … with the same mean \(\mu\)
  • … and standard deviation \[\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\] where \(n\) is the sample size.

Central Limit Theorem (part 2)

When sampling from any random variable \(X\) with mean (expected value) \(\mu\) and finite standard deviation \(\sigma\), then for large enough samples:

  • The sample means are approximately normally distributed…
  • … with the same mean \(\mu\)
  • … and standard deviation \[\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\] where \(n\) is the sample size.
  • The approximation gets better as the sample size increases.

What does this mean?

  • With a large sample, the sample mean is likely to be a pretty good estimate of the population mean.

  • Even if we know nothing about the population distribution, we do know (approximately) the distribution of the sample means, so we can calculate probabilities!