The Hypergeometric Distribution

We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.

If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).

The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.

The hypergeometric distribution allows us to answer questions like:

If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?

Let’s start with the formula

I express my distribution in the form of a formula:

\( f(X|N,M,n)=\frac{C^{N-M}_{n-x}\times C^M_x}{C^N_n} \ \)

The hypergeometric distribution explained with examples

We know that a batch of 30 pieces contains 6 malfunctioning pieces.
If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?

I’ll immediately write down the data:

  • N=30 (the total number of pieces in my batch)
  • M=6 (the total malfunctioning pieces present in the batch)
  • x=2 (I want to know the probability of finding 2 defective pieces)
  • n=5 (the size of my sample)

Let’s see how to solve the same problem in R:

# Definition of the hypergeometric distribution parameters
x <- 2 # I want to know the probability of finding 2 defective pieces
n <- 5 # the size of my sample
M <- 6 # the total malfunctioning pieces present in the batch
N <- 30 # the total number of pieces in my batch

# Probability calculation with the dhyper function
prob <- dhyper(x, M, N - M, n)
prob

and I get the output:

[1] 0.2130437

Can an example with an urn and balls be missing?

Hypergeometric distribution: drawing white or black balls from an urn.

Let’s now make another example: let’s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:

  • x=3 Number of white balls drawn
  • n=4 Number of balls drawn
  • M=5 Number of black balls
  • N = 15 Total number of balls

We have seen that in R, it’s possible to use the dhyper function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.

Here’s the R code:

# Definition of the hypergeometric distribution parameters
x <- 3 # Number of white balls drawn
n <- 4 # Number of balls drawn
M <- 5 # Number of black balls
N <- 15 # Total number of balls

# Probability calculation with the dhyper function
prob <- dhyper(x, M, N - M, n)
prob

The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.

To delve deeper into the topic of hypergeometric distribution

Leave a Reply

Your email address will not be published. Required fields are marked *