probability

The Hypergeometric Distribution

We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.

If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).

The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.

The hypergeometric distribution allows us to answer questions like:

If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?

Let’s start with the formula

I express my distribution in the form of a formula:

\( f(X|N,M,n)=\frac{C^{N-M}_{n-x}\times C^M_x}{C^N_n} \ \)

The hypergeometric distribution explained with examples

We know that a batch of 30 pieces contains 6 malfunctioning pieces.
If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?

I’ll immediately write down the data:

  • N=30 (the total number of pieces in my batch)
  • M=6 (the total malfunctioning pieces present in the batch)
  • x=2 (I want to know the probability of finding 2 defective pieces)
  • n=5 (the size of my sample)

Let’s see how to solve the same problem in R:

# Definition of the hypergeometric distribution parameters
x <- 2 # I want to know the probability of finding 2 defective pieces
n <- 5 # the size of my sample
M <- 6 # the total malfunctioning pieces present in the batch
N <- 30 # the total number of pieces in my batch

# Probability calculation with the dhyper function
prob <- dhyper(x, M, N - M, n)
prob

and I get the output:

[1] 0.2130437

Can an example with an urn and balls be missing?

Let’s now make another example: let’s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:

  • x=3 Number of white balls drawn
  • n=4 Number of balls drawn
  • M=5 Number of black balls
  • N = 15 Total number of balls

We have seen that in R, it’s possible to use the dhyper function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.

Here’s the R code:

# Definition of the hypergeometric distribution parameters
x <- 3 # Number of white balls drawn
n <- 4 # Number of balls drawn
M <- 5 # Number of black balls
N <- 15 # Total number of balls

# Probability calculation with the dhyper function
prob <- dhyper(x, M, N - M, n)
prob

The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.

Further Examination of the Hypergeometric Distribution

paolo

Recent Posts

Guide to Statistical Tests for A/B Analysis

Statistical tests are fundamental tools for data analysis and informed decision-making. Choosing the appropriate test…

9 months ago

How to Use Decision Trees to Classify Data

Decision Trees are a type of machine learning algorithm that uses a tree structure to…

11 months ago

The Gradient Descent Algorithm Explained Simply

Imagine wanting to find the fastest route to reach a destination by car. You could…

1 year ago

The Monte Carlo Method Explained Simply with Real-World Applications

Monte Carlo simulation is a method used to quantify the risk associated with a decision-making…

2 years ago

The Negative Binomial Distribution (or Pascal Distribution)

The negative binomial distribution describes the number of trials needed to achieve a certain number…

2 years ago

First Steps into the World of Probability: Sample Space, Events, Permutations, and Combinations

Probability and combinatorics are two fundamental concepts in mathematics and statistics that help us understand…

2 years ago