probability

The Hypergeometric Distribution

We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.

If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).

The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.

The hypergeometric distribution allows us to answer questions like:

If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?

What we will discuss

Let's start with the formula
The hypergeometric distribution explained with examples
Can an example with an urn and balls be missing?
Further Examination of the Hypergeometric Distribution

Let’s start with the formula

I express my distribution in the form of a formula:

\( f(X|N,M,n)=\frac{C^{N-M}_{n-x}\times C^M_x}{C^N_n} \ \)

The hypergeometric distribution explained with examples

We know that a batch of 30 pieces contains 6 malfunctioning pieces.
If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?

I’ll immediately write down the data:

N=30 (the total number of pieces in my batch)
M=6 (the total malfunctioning pieces present in the batch)
x=2 (I want to know the probability of finding 2 defective pieces)
n=5 (the size of my sample)

Let’s see how to solve the same problem in R:

# Definition of the hypergeometric distribution parameters
x <- 2 # I want to know the probability of finding 2 defective pieces
n <- 5 # the size of my sample
M <- 6 # the total malfunctioning pieces present in the batch
N <- 30 # the total number of pieces in my batch

# Probability calculation with the dhyper function
prob <- dhyper(x, M, N - M, n)
prob

and I get the output:

[1] 0.2130437

Can an example with an urn and balls be missing?

Let’s now make another example: let’s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:

x=3 Number of white balls drawn
n=4 Number of balls drawn
M=5 Number of black balls
N = 15 Total number of balls

We have seen that in R, it’s possible to use the dhyper function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.

Here’s the R code:

# Definition of the hypergeometric distribution parameters
x <- 3 # Number of white balls drawn
n <- 4 # Number of balls drawn
M <- 5 # Number of black balls
N <- 15 # Total number of balls

# Probability calculation with the dhyper function
prob <- dhyper(x, M, N - M, n)
prob

The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.