Math 132B

Class 2

Population and Sample

Population: the part of real world we want to study
Sample: a subset of the population that we collect data from.

How to select a sample

Sample should be representative of the population.
- Question: Can we generalize the information we got from the sample to the whole population?
Sample should satisfy the assumptions of the mathematical methods we use to analyze the data.
- Question: What kind of method do we need to use to analyze the data?

Biased Sample

Biased Sample?

Some Examples

When studying the study habits of SVSU students, we go to the library on Friday afternoon and survey 20 students that happened to be there.
When studying the mathematical aptitude of SVSU students, we randomly pick 3 classrooms in Science East, and survey the students there.
When studying the height of SVSU students, we randomly pick 3 classrooms in Science East, and survey the students there.

Sampling example

Another Example

When studying political opinions of adult Americans, we send questionnaires to 20000 randomly selected residential addresses in the USA, and use the 700 questionnaires that were filled in and returned to us.

Another Example

When comparing the difficulty of two academic programs, we randomly select 200 graduates who graduated from Program A, and 200 graduates who graduated from Program B, and survey them.

Mathematical concerns

Sample should satisfy the assumptions of the mathematical methods we use to analyze the data.

Random sampling!

Mathematicians spent centuries developing powerful theories to analyze and understand randomness.

Sample Size

How large should a sample be?

Up to some extent, larger samples are better.
Larger samples are harder to collect, more expensive, …
Some statistical methods do not work well with too large samples.
Sometimes increasing the sample size does not bring any new information.

Example 1

Example 2

Example 3

Convenience Samples

The idea: just select the \(n\) subjects that are the easiest to get.

Suppose we have a large well mixed pile of gravel, and we want to study the distribution of the particle sizes.

Sample 1: With your hands, randomly collect 200 pieces of gravel and use that.
Sample 2: Scoop a bucket full of gravel from the pile and use that.

Systematic sample

On the other hand

Random Sampling

Random sample

Every individual in the population has the same chance of being selected

Simple random sample

Given size \(n\)
Every group of size \(n\) has the same chance of being selected

Simple Random Sample

More complicated example

Student representatives in a large high school want to survey the student body in order to determine what percentage of the students would be willing to pay an extra one time fee to raise money for re-paving the school parking lot.

More complicated example (cont.)

Stratified Sampling

Divide the population into several groups (layers, strata), and randomly select some number of subjects in each stratum.
Retain some degree of control about some aspects of the sample.
Compromise between randomness and control.
Requires more advanced methods to analyze the results.

Survey of a forest

Goal: survey of trees in the forest.

Divide population into clusters.

Randomly select some clusters.

Survey all subjects in selected clusters.

Cluster Sampling

Divide the population into several groups (clusters), randomly select some of the clusters and survey all subjects in each of the selected clusters.
Often much easier to do than a simple random sample.
Compromise between randomness and convenience.
Requires more advanced methods to analyze the results.

Surveying people in Michigan

We want to personally interview people in Michigan, but we want a lot of different parts of Michigan to be represented.

Multistage Sampling

For example:

Randomly select 10 US states.
From each state, randomly select 30 counties.
From each county, randomly select 200 residents.

Again, a compromise between randomness and convenience.

More complicated sampling methods

Stratified
- Divide the population into several groups (strata), and randomly select some number of subjects in each stratum.
- Retain some degree of control about some aspects of the sample.
Cluster
- Divide the population into several groups (clusters), randomly select some of the clusters and survey all subjects in each of the selected clusters.
- Often much easier to do than a simple random sample.
Multistage
- Even easier than cluster.

Summary

Sampling methods:

Convenience
Systematic
Simple random
Stratified
Cluster
Multistage
…