First Steps into the World of Probability: Sample Space, Events, Permutations, and Combinations

Probability and combinatorics are two fundamental concepts in mathematics and statistics that help us understand and interpret many phenomena in everyday life. In this introductory post, we’ll “touch upon” the main concepts together, seeing how they can be applied in various contexts.

Continue reading “First Steps into the World of Probability: Sample Space, Events, Permutations, and Combinations”

Logistic Regression: Predicting the Outcome of an Event

Logistic regression is a statistical model used to predict the probability of an event based on a set of independent variables. It’s particularly useful when you want to classify an event as belonging or not to a specific category (for example, whether a customer will buy a product or not, or whether a patient will develop a disease or not).

It is a Supervised Machine Learning algorithm that can be used to model the probability of a specific class or event. It is used when the data is linearly separable – that is, if there exists a line or plane that can be used to uniquely separate the data into different classes – and the outcome is binary or dichotomous. This means that logistic regression is typically used for binary classification problems (Yes/No, Correct/Incorrect, True/False, etc.),

In this post, I will demonstrate how to perform binomial logistic regression to create a classification model, in order to predict binary responses on a given set of predictors.

Continue reading “Logistic Regression: Predicting the Outcome of an Event”

Non-Parametric Tests: The Wilcoxon Test for Non-Normal Data

The Wilcoxon test is a non-parametric test used to compare two independent samples, or a sample with a known reference value.
The test is used when the data do not follow a normal distribution, or when the distribution parameters are unknown.

Continue reading “Non-Parametric Tests: The Wilcoxon Test for Non-Normal Data”

The Beta Distribution Explained Simply

The Beta distribution is a crucial probability distribution in Bayesian statistics.

In theoretical probability problems, we know the exact probability value of a single event, making it relatively straightforward to apply basic probability calculation rules to reach the desired result.

In real life, however, it’s much more common to deal with collections of observations, and it’s from this data that we must derive probability estimates.

Continue reading “The Beta Distribution Explained Simply”

Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)

In various posts, particularly those on regression analysis, variance analysis, and time series, we’ve come across terms that seem deliberately designed to scare the reader.
The aim of these articles is to explain these key concepts simply, beyond the apparent complexity (something I really wanted when I was a student, instead of facing texts written in a purposely convoluted and unnecessarily difficult way).
So, it’s time to spend a few words on three very important concepts that often recur in statistical analysis and need to be well understood. The reality is much, much clearer than it seems, so… don’t be afraid!

Continue reading “Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)”