Non-Parametric Hypothesis t-Test
Wilcoxon Signed-Rank, Mann-Whitney U, Kruskal-Wallis H, Friedman Test
Hi, last time that is in the last article we saw how to perform a Paired t-test to address and solve the most common difficult answer of “How to compare two machine learning models?” at the same time we also saw the limitations of the paired t-test with the assumptions of parametric tests:
Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.
Today we will look into some non-parametric tests and is more practical for a real-life use case on “How to compare two machine learning models or even for any sample data” because in real-life scenarios it is difficult to have all the assumptions of parametric tests align, especially observations in each sample are normally distributed.
let’s have a look at the Non-parametric tests
- Mann-Whitney U Test — Tests whether the distributions of two independent samples are equal or not. The same as the One-sample t-test.
Assumptions:
*Observations in each sample are independent & identically distributed(iid)
*Observations in each sample can be ranked.
from scipy.stats import mannwhitneyudata1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]stat, p = mannwhitneyu(data1, data2)print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably the same distribution')
else:
print('Probably different distributions')
2. Wilcoxon Signed-Rank Test: tests whether the distributions of two paired samples are equal or not. Same like Paired t-Test
What if your data samples don’t come from a normal distribution? in that case, Wilcoxon Signed rank Test. This test does not make any assumption of normality, all you need is that your samples come from a distribution that is symmetric around the mean. Also, it does not matter on the slope and doesn't matter how fat the tails of the distribution.
Assumptions:
Observations in each sample are independent & identically distributed (iid).
Observations in each sample can be ranked.
Observations across each sample are paired.
from scipy.stats import wilcoxondata1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]stat, p = wilcoxon(data1, data2)print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably the same distribution')
else:
print('Probably different distributions')
3. Kruskal-Wallis H Test or One-way Anova: tests whether the distributions of two or more independent samples are equal or not.
What is the difference between Mann-Whitney and Kruskal-Wallis?
The major difference between the Mann-Whitney U and the Kruskal-Wallis H is simply that the latter can accommodate more than two groups.
The ANOVA (and t-test) is explicitly a test of equality of means of values. The Kruskal-Wallis (and Mann-Whitney) can be seen technically as a comparison of the mean ranks.
Assumptions:
Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
from scipy.stats import kruskaldata1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]stat, p = kruskal(data1, data2)print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably the same distribution')
else:
print('Probably different distributions')
4. Friedman Test: tests whether the distributions of two or more paired samples are equal or not alternative to the Repeated Measures ANOVA.
It is used to determine whether or not there is a statistically significant difference between the means of three or more groups in which the same subjects show up in each group.
Assumptions:
Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
Observations across each sample are paired.
Use case Examples:
1. Measuring the mean scores of subjects during three or more time points.
For example, you might want to measure the resting heart rate of subjects one month before they start a training program, one month after starting the program, and two months after using the program. You can perform the Friedman Test to see if there is a significant difference in the mean resting heart rate of patients across these three-time points.
2. Measuring the mean scores of subjects under three different conditions.
For example, you might have subjects watch three different movies and rate each one based on how much they enjoyed it. Since each subject shows up in each sample, you can perform a Friedman Test to see if there is a significant difference in the mean rating of the three movies.
More details @: https://www.statology.org/friedman-test
#python version of friedman-test
from scipy.stats import friedmanchisquaredata1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]stat, p = friedmanchisquare(data1, data2, data3)print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably the same distribution')
else:
print('Probably different distributions')
I hope you find this article useful for your machine learning and statistical use cases. Likewise, i will try to bring new ways across with the motto “curiosity leads to innovation” :)
Thanks again, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Some of my alternative internet presences are Facebook, Instagram, Udemy, Blogger, Issuu, Slideshare, Scribd and more.
Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy
Let me know if you need anything. Talk Soon.