Why can't the Wilcoxon matched pair test ever report a P value less than 0.05 (two tailed) with five or fewer pairs of data?
The Wilcoxon matched pairs test is a nonparametric test of paired data. The first thing it does is subtract the outcome in the "after" condition from "before" to compute the difference. Then it ranks the absolute values of those differences, and finally compares the ranks of the positive differences (after greater than before) to the ranks of the negative differences (after value less than before).
If all the pairs go in the same direction, so all the differences are positive or all the differences are negative, it is easy to compute the P value by hand.
Lets say you have four pairs of data, and in each case the "after" response was higher. The P value from the Wilcoxon matched pairs test answers this question:
If the effect was really zero on average, so that the "after" value is higher than "before" half the time and lower half the time, what is the chance that all four of the pairs would go in the same direction?
The null hypothesis is that the treatment really causes no difference. In this case, the after result is as likely to be higher than before as it is to be lower. It is like flipping a coin. So the P value can also be computed by answering this question:
If I flip four fair coins, what is the chance that all will be heads or all will be tails?
That chance is 0.5*0.5*0.5*0.5*2, which is 0.125. With only four pairs of data, the Wilcoxon matched pairs test can never give a lower two-tail P value than that.
it is easy to generalize the results. The table below gives the lowest possible two-tail P value for various numbers of pairs.
Number of pairs | Smallest possible two-tail P value |
2 | 0.500 |
3 | 0.250 |
4 | 0.125 |
5 | 0.062 |
6 | 0.031 |
It is often said that nonparametric tests have nearly the same power as t tests, when the data are truly Gaussian. The table shows that this is not true with small samples. With five or fewer data pairs, the Wilcoxon matched pairs test has zero power -- no matter what data it is given, the test reports a two-tail P value greater than 0.05.