Class 2
Population: the part of real world we want to study
Sample: a subset of the population that we collect data from.
Sample should be representative of the population.
Sample should satisfy the assumptions of the mathematical methods we use to analyze the data.
When studying the study habits of SVSU students, we go to the library on Friday afternoon and survey 20 students that happened to be there.
When studying the mathematical aptitude of SVSU students, we randomly pick 3 classrooms in Science East, and survey the students there.
When studying the height of SVSU students, we randomly pick 3 classrooms in Science East, and survey the students there.
When studying political opinions of adult Americans, we send questionnaires to 20000 randomly selected residential addresses in the USA, and use the 700 questionnaires that were filled in and returned to us.
When comparing the difficulty of two academic programs, we randomly select 200 graduates who graduated from Program A, and 200 graduates who graduated from Program B, and survey them.
Sample should satisfy the assumptions of the mathematical methods we use to analyze the data.
Random sampling!
Mathematicians spent centuries developing powerful theories to analyze and understand randomness.
How large should a sample be?
The idea: just select the \(n\) subjects that are the easiest to get.
Suppose we have a large well mixed pile of gravel, and we want to study the distribution of the particle sizes.
Random sample
Simple random sample
Student representatives in a large high school want to survey the student body in order to determine what percentage of the students would be willing to pay an extra one time fee to raise money for re-paving the school parking lot.
Divide the population into several groups (layers, strata), and randomly select some number of subjects in each stratum.
Retain some degree of control about some aspects of the sample.
Compromise between randomness and control.
Requires more advanced methods to analyze the results.
Goal: survey of trees in the forest.
Divide population into clusters.
Randomly select some clusters.
Survey all subjects in selected clusters.
Divide the population into several groups (clusters), randomly select some of the clusters and survey all subjects in each of the selected clusters.
Often much easier to do than a simple random sample.
Compromise between randomness and convenience.
Requires more advanced methods to analyze the results.
We want to personally interview people in Michigan, but we want a lot of different parts of Michigan to be represented.
For example:
Randomly select 10 US states.
From each state, randomly select 30 counties.
From each county, randomly select 200 residents.
Again, a compromise between randomness and convenience.
Sampling methods: