The Wilcoxon test is a non-parametric test used to compare two independent samples, or a sample with a known reference value.
The test is used when the data do not follow a normal distribution, or when the distribution parameters are unknown.
The Wilcoxon test involves ranking the data from both samples, and then assigning a score to each value based on its position in the ranking. The scores are then summed for each sample, and the difference between the sum of scores of the two samples is compared to a known reference value, using the Wilcoxon distribution.
Based on the result of the comparison, one can decide whether to accept or reject the null hypothesis.
The Wilcoxon test is often used to compare the values of a continuous variable between two groups. There is also a version of the test called the Wilcoxon-Mann-Whitney test, which is used when comparing two groups with an ordinal or categorical variable.
In this example, I will generate sample data for two groups, group1
and group2
, using the rnorm()
function to generate random numbers that follow a normal distribution with a mean of 100 and standard deviation of 15 for the first group, and a mean of 110 and standard deviation of 15 for the second group.
I use the wilcox.test() function to perform the Wilcoxon test, and specify the alternative hypothesis as “two.sided” to test whether the two groups have significantly different means.
The test results are printed on the screen and include the test statistic value, the p-value, and the test conclusion. Based on the p-value, one can decide whether to accept or reject the null hypothesis.
# Create sample data set.seed(123) group1 <- rnorm(100, mean = 100, sd = 15) group2 <- rnorm(100, mean = 110, sd = 15) # Perform the Wilcoxon test wilcox_test <- wilcox.test(group1, group2, alternative = "two.sided") # Display the test results print(wilcox_test)
The most commonly used significance level is 5% or 0.05. This means that a threshold of 5% is established, above which the observed effect is considered random, and below which the observed effect is considered statistically significant. In other words, if the p-value obtained from the test is less than 0.05, the null hypothesis is rejected, and it is concluded that there is a significant difference between the samples.
It’s important to note that these threshold values are conventional and can be modified based on the specific needs of the study or the discipline in which one is working.
Statistical tests are fundamental tools for data analysis and informed decision-making. Choosing the appropriate test…
Decision Trees are a type of machine learning algorithm that uses a tree structure to…
Imagine wanting to find the fastest route to reach a destination by car. You could…
Monte Carlo simulation is a method used to quantify the risk associated with a decision-making…
We have seen that the binomial distribution is based on the hypothesis of an infinite…
The negative binomial distribution describes the number of trials needed to achieve a certain number…