Hypothesis testing is a vital part of statistics. It essentially takes two statements and evaluates which the sample data best supports statistically. The results determine if there is significance. Real, measurable and quantifiable significance.
The basis of hypothesis testing has to do with comparing averages or means. It allows us quantify the probability that our sample mean is unusual when compared to that of the population. How unusal is in comparison is what tells us if we can reject the null hypothesis - or what is currently accepted as ‘normal’.
The ojective is simple enough - comparing two means. However, in our most recent project, I found that this dataset was much like real life: not ‘normal’. I found that selection of which testing methodology to apply is crucial for an accurate result and that a null hypothesis can be falsely rejected (aka a type 1 error), completely defeating the purpose of testing in the first place.
Parametric tests are what are typically used in hypothesis testing and are described for the most part in what has been explained. They compare the means of the sample group with that of the population. However, they should only be implemented when the data being tested meets certain assumptions. Because parametric tests rely on the central limit theory they require the following assumptions:
- Data must be numeric
- Data must be normally distributed
- No Significant Outliers
- In the case of more than two samples, they must have equal variance.
However, the world is an imperfect place. Datasets are small, skewed, categorical or ordinal in nature. They can be proven as such with tests like D’Agostino-Pearson’s normality test, or Shapiro-Wilik Test for normality and Levene’s Test for equal variance. Enter the non-parametric tests. These tests find ways around the ‘non-normal’ data and provide a more accurate result in determining if the sample is unusual - significantly so.
Reasons to use non-parametric tests:
- Your data is skewed or better represented by the median
- You have a small sample size
- You have to keep your outliers - example, your sample size is too small.
- You have ordinal or categorical data.
Non parametric tests will use the median as one way to determine if the sample is unusual. Below is a summary table that outlines at a high level the various test options. The first row illustrates one-sample tests, the second - is for comparing two samples , and the third row - for three or more samples to be compared.
Summary Table - Hypothesis Testing Functions:
Parametric tests (means) | Function | Nonparametric tests (medians) | Function |
---|---|---|---|
1-sample t test | scipy.stats.ttest_1samp() | 1-sample Wilcoxon | scipy.stats.wilcoxon() |
2-sample t test | scipy.stats.ttest_ind() | Mann-Whitney U test | scipy.stats.mannwhitneyu() |
One-Way ANOVA | scipy.stats.f_oneway() | Kruskal-Wallis | scipy.stats.kruskal() |
Post-hoc testing like a Cohen’s d can further assess the effect size and I’ve found, prove additional efficacy of hypotheis testing.
In my series of testing a preliminary visual examination using exploratory data analysis illustrated the possiblity that I could reject the null hypothesis. Excited that I was on the right track, I continued my evaluation of the data which demonstrated to be not of equal variance. Seeing as the data did not meet all the assumptions for a parametric test, I proceeded with a non-parametric test ( Kruskal-Wallis ). The results were what’s called a “p-value” or probability value of higher than .05. This means that there was greater than a 5% chance that this data sample came from the same population as the sample I was comparing it to. I could NOT reject the null hypothesis. Shocked, and despite not meeting the assumptions, I defiantley ran a parametric test and - ha! I got the result I expected/wanted: a p-value of less than .05 which meant that I could reject the null hypothesis. However, in post-hoc testing effect sizes and adjusted p-values proved the non-parametric test was the accurate assessment. That a non-parametric test yeilded an accurate result.
Once again, I learned the lesson to let the data tell the story. It doesn’t lie.