We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.
If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).
The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.
The hypergeometric distribution allows us to answer questions like:
If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?
I express my distribution in the form of a formula:
\( f(X|N,M,n)=\frac{C^{N-M}_{n-x}\times C^M_x}{C^N_n} \ \)We know that a batch of 30 pieces contains 6 malfunctioning pieces.
If I take a sample of 5 pieces, what is the probability of finding exactly 2 defective pieces?
I’ll immediately write down the data:
Let’s see how to solve the same problem in R:
# Definition of the hypergeometric distribution parameters x <- 2 # I want to know the probability of finding 2 defective pieces n <- 5 # the size of my sample M <- 6 # the total malfunctioning pieces present in the batch N <- 30 # the total number of pieces in my batch # Probability calculation with the dhyper function prob <- dhyper(x, M, N - M, n) prob
and I get the output:
[1] 0.2130437
Let’s now make another example: let’s estimate the probability that in an urn with 10 white balls and 5 black ones, drawing 4 balls without replacement, we get 3 white and 1 black. So:
We have seen that in R, it’s possible to use the dhyper
function to calculate the probability of drawing 3 white balls and 1 black ball from the described urn.
Here’s the R code:
# Definition of the hypergeometric distribution parameters x <- 3 # Number of white balls drawn n <- 4 # Number of balls drawn M <- 5 # Number of black balls N <- 15 # Total number of balls # Probability calculation with the dhyper function prob <- dhyper(x, M, N - M, n) prob
The probability of drawing 3 white balls and 1 black ball is therefore 0.07326007, or about 7.33%.
Statistical tests are fundamental tools for data analysis and informed decision-making. Choosing the appropriate test…
Decision Trees are a type of machine learning algorithm that uses a tree structure to…
Imagine wanting to find the fastest route to reach a destination by car. You could…
Monte Carlo simulation is a method used to quantify the risk associated with a decision-making…
The negative binomial distribution describes the number of trials needed to achieve a certain number…
Probability and combinatorics are two fundamental concepts in mathematics and statistics that help us understand…