slides – Math 132B

Binomial Random Variables Recap

is a binomial random variable if it represents the number of successes in replications of an experiment where

Each replicate is independent of the other replicates.
Each replicate has two possible outcomes: either success or failure.
The probability of success in each replicate is constant.

.

Possible values: .

Binomial probability:

Mean and SD

If , then

Expected Value (mean) is
Standard Deviation is

Binomial Probabilities in `R`

The function dbinom() is used to calculate .

dbinom(k, size=n, prob=p)

For example, if , then is

dbinom(2, size=3, prob=1/6)

[1] 0.06944444

The d in dbinom stands for distribution or density.

Binomial Probabilities in `R` …

The function pbinom() is used to calculate or .

:

pbinom(k, size=n, prob=p)
:

pbinom(k, size=n, prob=p, lower.tail = FALSE)

The p stands for probability.

`pbinom` examples:

if , then is

pbinom(13, size=16, prob=1/2)

[1] 0.9979095

while is

pbinom(13, size=16, prob=1/2, lower.tail=FALSE)

[1] 0.002090454

or, equivalently:

1 - pbinom(13, size=16, prob=1/2)

[1] 0.002090454

Continuous random variables

A discrete random variable takes on a finite number of values.

Number of heads in a set of coin tosses
Number of people who have had chicken pox in a random sample

A continuous random variable can take on any real value in an interval.

Height in a population
Blood pressure in a population

A general distinction to keep in mind: discrete random variables are counted, but continuous random variables are measured.

Describing continuous variable

Possible values form one or more intervals.
So many possible values that probability of each single one is 0.
We cannot describe the distribution using a probability table or distribution function.

We use so-called density function or density curve.

Density curves

The total area under the density curve is 1.
The probability that a variable has a value within a specified interval is the area under the curve over that interval.

Probabilities for continuous distributions

When working with continuous random variables, probability is found for intervals of values rather than individual values.
The probability that a continuous r.v. takes on any single individual value is 0. That is, .
Thus, is equal to .
The probability is given by the area under the density curve!

The probability

The normal distribution

Special shape of the density curve (kind of like a bell)
Symmetric, centered around a specific number (called the mean, denoted )
The “width” of the “bell” depends on the spread of the distribution (the amount of uncertainty), and is usually determined by the standard deviation, denoted by .
Variable having a normal distribution with given and is denoted at .

The normal distribution

gf_dist("norm", params=c(mean = 4, sd = 2))

The normal distribution

gf_dist("norm", params=c(mean = 4, sd = 2)) |>
    gf_dist("norm", mean = 4, sd = 3, color="red")

Meaning of Standard Deviation

Empirical Rule for normal distribution:

approximately 68% of the values are within 1 SD of the mean
approximately 95% of the values are within 2 SDs of the mean
approximately 99.7% of the values are within 3 SDs of the mean

68-95-99.7

A Normal Example

The distribution of test scores on the SAT and the ACT are both nearly normal.

Suppose that one student scores an 1800 on the SAT (Student A) and another student scores a 24 on the ACT (Student B). Which student performed better?

A Normal Example

Standard Normal Distribution

The standard normal distribution is defined as a normal distribution with mean 0 and variance 1. It is often denoted as .

Any normal random variable can be transformed into a standard normal random variable .

A Normal Example…

SAT scores are . ACT scores are .
represents the score of Student A; represents the score of Student B.

Calculating Normal Probabilities (I)

What is the percentile rank for a student who scores an 1800 on the SAT for a year in which the scores are ?

Calculate a -score. If is a normal random variable with mean and standard deviation , is a standard normal random variable (, ).
```
(1800 - 1500)/300
```
```
[1] 1
```
Find the normal probability in one of the tables, or let R do the work:

pnorm(z) calculates the area (i.e., probability) to the left of
```
pnorm(1)
```
```
[1] 0.8413447
```

Alternatively, let `R` do all the work …

What is the percentile rank for a student who scores an 1800 on the SAT for a year in which the scores are ?

pnorm(1800, mean = 1500, sd = 300)

[1] 0.8413447

Calculating Normal Probabilities (II)

Which score on the SAT would put a student in the 99 percentile?

Identify the -value from the table or using R: qnorm(p) calculates the value such that for a standard normal variable , .

qnorm(0.99) gives us 2.326348, or approximately 2.33.
Calculate the score, . If is distributed standard Normal, then is Normal with mean and standard deviation .

Alternatively, let `R` do the work …

Which score on the SAT would put a student in the 99 percentile?

qnorm(0.99, mean = 1500, sd = 300)

[1] 2197.904

The q in qnorm stands for quantile.

Another example

Find the probability that if

Calculate the -score:
Now we need to find the area to the right of 1.8 under the standard normal curve.

The original curve

Area to the right of 17.5.

The standard normal curve

Area to the right of 1.8.

Wrong area

Both R and the tables give you area to the left of a given -score!

Like this:

Subtraction

The total area under the normal curve is always 1.
All we need to do is subtract:

area to the right = 1 - area to the left
Finally, we can find the area to the right of 1.8.

1 - pnorm(1.8)

[1] 0.03593032

pnorm(1.8, lower.tail=FALSE)

[1] 0.03593032