Class 15
Goal: Create a mathematical model for some random variable(s),
or “evaluate” an existing model for some random variables(s),
using a sample of its (their) values.
Model for a random variable: A distribution with some parameters.
The random variable is tied to some “population”:
We then talk about the population distribution, the population parameter, and a model of the population.
Figure out exactly what model to use.
That means complete description of the distribution, including the exact values of all the parameters.
We only have a sample of the values, that’s not going to be enough.
Figure out something about one of the parameters, or
Figure out something about the way the variable is distributed.
Perhaps we already have some idea about the type of the distribution, can we learn something about its parameter(s)?
Suppose I already decided that a normal distribution would be an appropriate model for my population.
I want to know what I should use as mean and standard deviation.
I collect a sample and calculate its mean and standard deviation.
What does it tell me?
It may be useful to know something about relationship of sample mean and population mean…
The YRBSS data set contains 13,583 observations from surveys conducted from 1991 to 2013.
It has 13 variables, we will look at height.
The data set is available as a part of the oibiostat package.
When sampling from a population that is normally distributed with mean \(\mu\) and standard deviation \(\sigma\):
When sampling from any random variable \(X\) with mean (expected value) \(\mu\) and finite standard deviation \(\sigma\), then for large enough samples:
With a large sample, the sample mean is likely to be a pretty good estimate of the population mean.
Even if we know nothing about the population distribution, we do know (approximately) the distribution of the sample means, so we can calculate probabilities!