Math 132B

Class 12

Binomial Random Variables

\(X\) is a binomial random variable if it represents the number of successes in \(n\) replications of an experiment where

  • Each replicate is independent of the other replicates.
  • Each replicate has two possible outcomes: either success or failure.
  • The probability of success \(p\) in each replicate is constant.

\(X \sim \operatorname{Binom}(n, p)\).

Possible values: \(0, 1, 2, \dots, n\).

The Binomial Coefficient

The binomial coefficient \(\binom{n}{x}\) is the number of ways to choose \(x\) items from a set of size \(n\), where the order of the choices is ignored.

Mathematically,

\[\binom{n} {x} = \frac{\overbrace{n\cdot(n-1)\cdot(n-2)\cdots(n-x+1)}^{x \text{ factors}}} {\underbrace{x\cdot(x-1)\cdot(x-2)\cdots 1}_{x \text{ factors}}}\]

Examples

  • \(\binom{7}{3} = \frac{7\cdot \cancel{6}\cdot 5}{\cancel{3}\cdot \cancel{2}\cdot 1} = 35\)
  • \(\binom{7}{6} = \frac{7\cdot\cancel{6}\cdot\cancel{5}\cdot\cancel{4}\cdot\cancel{3}\cdot\cancel{2}}{\cancel{6}\cdot\cancel{5}\cdot\cancel{4}\cdot\cancel{3}\cdot\cancel{2}\cdot 1} = 7\)
  • \(\binom{10}{4} = \frac{10\cdot\cancelto{3}{9}\cdot\bcancel{8}\cdot{7}}{\bcancel{4}\cdot\cancel{3}\cdot\bcancel{2}\cdot 1} = 210\)
  • \(\binom{10}{6} = \frac{10\cdot{9}\cdot{8}\cdot{7}\cdot\cancel{6}\cdot\cancel{5}}{\cancel{6}\cdot\cancel{5}\cdot{4}\cdot{3}\cdot{2}\cdot 1} = 210\)

Formula for the binomial distribution

Let \(X\) = number of successes in \(n\) trials, and \(0 \le x \le n\).

\[P(X = x)=\binom{n}{x} p^x (1-p)^{n-x}\]

Parameters of the distribution:

  • \(n\) = number of trials

  • \(p\) = probability of success

Examples

  • Let \(n = 3\) and \(p = 1/2\).

    Calculate \(P(X = 1)\).

  • Let \(n = 3\) and \(p = 1/6\).

    Calculate \(P(X = 2)\).

  • Let \(n = 16\) and \(p = 1/2\).

    Calculate \(P(X = 14)\).

    (Hint: \(\binom{16}{14} = \binom{16}{2}\))

Mean and SD for a binomial random variable

For a binomial distribution with parameters \(n\) and \(p\), it can be shown that:

  • Mean = \(np\)

  • Standard Deviation = \(\sqrt{np(1-p)}\)

The derivation is not shown here nor in the text; it will not be asked for on a problem set or exam.

Binomial Probabilities in R

The function dbinom() is used to calculate \(P(X = k)\).

dbinom(k, size=n, prob=p): \(P(X = k)\)

For example, if \(X \sim \operatorname{Binom}(3, 1/6)\), the \(P(X = 2)\) is

dbinom(2, size=3, prob=1/6)
[1] 0.06944444

The d in dbinom stands for distribution or density.

Binomial Probabilities in R

The function pbinom() is used to calculate \(P(X \leq k)\) or \(P(X > k)\).

  • \(P(X \leq k)\):

    pbinom(k, size=n, prob=p)

  • \(P(X > k)\):

    pbinom(k, size=n, prob=p, lower.tail = FALSE)

The p stands for probability.

pbinom examples:

if \(X \sim \operatorname{Binom}(16, 1/2)\), then \(P(X \le 13)\) is

pbinom(13, size=16, prob=1/2)
[1] 0.9979095

while \(P(X \ge 14) = P(X > 13)\) is

pbinom(13, size=16, prob=1/2, lower.tail=FALSE)
[1] 0.002090454

or, equivalently:

1 - pbinom(13, size=16, prob=1/2)
[1] 0.002090454

Continuous random variables

A discrete random variable takes on a finite number of values.

  • Number of heads in a set of coin tosses

  • Number of people who have had chicken pox in a random sample

A continuous random variable can take on any real value in an interval.

  • Height in a population

  • Blood pressure in a population

A general distinction to keep in mind: discrete random variables are counted, but continuous random variables are measured.

Describing continuous variable

  • Possible values form one or more intervals.
  • So many possible values that probability of each single one is 0.
  • We cannot describe the distribution using a probability table or distribution function.

We use so-called density function or density curve.

Density curves

  • The total area under the density curve is 1.

  • The probability that a variable has a value within a specified interval is the area under the curve over that interval.

Probabilities for continuous distributions

  • When working with continuous random variables, probability is found for intervals of values rather than individual values.

  • The probability that a continuous r.v. \(X\) takes on any single individual value is 0. That is, \(P(X = x) = 0\).

  • Thus, \(P(a < X < b)\) is equal to \(P(a \leq X \leq b)\).

  • The probability \(P(a \le X \le b)\) is given by the area under the density curve!

The probability \(P(8 \le X \le 12)\)

The normal distribution

  • Special shape of the density curve (kind of like a bell)
  • Symmetric, centered around a specific number (called the mean, denoted \(\mu\))
  • The “width” of the “bell” depends on the spread of the distribution (the amount of uncertainty), and is usually determined by the standard deviation, denoted by \(\sigma\).
  • Variable \(X\) having a normal distribution with given \(\mu\) and \(\sigma\) is denoted at \(X \sim N(\mu, \sigma)\).

The normal distribution

gf_dist("norm", params=c(mean = 4, sd = 2))

The normal distribution

gf_dist("norm", params=c(mean = 4, sd = 2)) |>
    gf_dist("norm", mean = 4, sd = 3, color="red")

Meaning of Standard Deviation

Empirical Rule for normal distribution:

  • approximately 68% of the values are within 1 SD of the mean

  • approximately 95% of the values are within 2 SDs of the mean

  • approximately 99.7% of the values are within 3 SDs of the mean

68-95-99.7

A Normal Example

The distribution of test scores on the SAT and the ACT are both nearly normal.

Suppose that one student scores an 1800 on the SAT (Student A) and another student scores a 24 on the ACT (Student B). Which student performed better?

A Normal Example

Standard Normal Distribution

The standard normal distribution is defined as a normal distribution with mean 0 and variance 1. It is often denoted as \(Z \sim N(0, 1)\).

Any normal random variable \(X\) can be transformed into a standard normal random variable \(Z\).

\[Z = \dfrac{X - \mu}{\sigma} \qquad X = \mu + Z\sigma\]

A Normal Example…

  • SAT scores are \(N(1500, 300)\). ACT scores are \(N(21,5)\).

  • \(x_A\) represents the score of Student A; \(x_B\) represents the score of Student B.

\[z_{A} = \frac{x_{A} - \mu_{SAT}}{\sigma_{SAT}} = \frac{1800-1500}{300} = 1 \]

\[z_{B} = \frac{x_{B} - \mu_{ACT}}{\sigma_{ACT}} = \frac{24 - 21}{5} = 0.6\]

Calculating Normal Probabilities (I)

What is the percentile rank for a student who scores an 1800 on the SAT for a year in which the scores are \(N(1500, 300)\)?

  1. Calculate a \(Z\)-score. If \(X \sim N(\mu, \sigma)\), then \[Z = \frac{X - \mu}{\sigma},\] is a standard normal random variable, or, in symbols, \(Z \sim N(0,1)\).

    (1800 - 1500)/300
    [1] 1
  2. Find the normal probability in one of the tables, or let R do the work:

    pnorm(z) calculates the area (i.e., probability) to the left of \(z\)

    [1] 0.8413447

Alternatively, let R do all the work …

What is the percentile rank for a student who scores an 1800 on the SAT for a year in which the scores are \(N(1500, 300)\)?

pnorm(1800, mean = 1500, sd = 300)
[1] 0.8413447

Calculating Normal Probabilities (II)

Which score on the SAT would put a student in the 99\(^{th}\) percentile?

  1. Identify the \(Z\)-value from the table or using R: qnorm(p) calculates the value \(z\) such that for a standard normal variable \(Z\), \(p = P(Z \leq z)\).

    qnorm(0.99) gives us 2.326348, or approximately 2.33.

  2. Calculate the score, \(X\). If \(Z \sim N(0,1)\), then \[X = \sigma Z + \mu\] is Normal with mean \(\mu\) and standard deviation \(\sigma\), in symbols \(X \sim N(\mu, \sigma)\).

    \[X = \sigma Z + \mu = 300(2.33) + 1500 = 2199\]

Alternatively, let R do the work …

Which score on the SAT would put a student in the 99\(^{th}\) percentile?

qnorm(0.99, mean = 1500, sd = 300)
[1] 2197.904

The q in qnorm stands for quantile.

Another example

Find the probability that \(X \ge 17.5\) if \(X \sim N(13, 2.5)\)

  • Calculate the \(z\)-score:

    \[z = \frac{17.5 - 13}{2.5} = 1.8\]

  • Now we need to find the area to the right of 1.8 under the standard normal curve.

The original curve

Area to the right of 17.5.

The standard normal curve

Area to the right of 1.8.

Wrong area

  • Both R and the tables give you area to the left of a given \(z\)-score!

Like this:

Subtraction

  • The total area under the normal curve is always 1.

  • All we need to do is subtract:

    area to the right = 1 - area to the left

  • Finally, we can find the area to the right of 1.8.

1 - pnorm(1.8)
[1] 0.03593032
pnorm(1.8, lower.tail=FALSE)
[1] 0.03593032