Kline (1) lists commonly believed fallacies about P values, which I summarize here:
Fallacy: P value is the probability that the result was due to sampling error
The P value is computed assuming the null hypothesis is true. In other words, the P value is computed based on the assumption that the difference was due to sampling error. Therefore the P value cannot tell you the probability that the result is due to sampling error.
Nope. The P value is computed assuming that the null hypothesis is true, so cannot be the probability that it is true.
If the P value is 0.03, it is very tempting to think: If there is only a 3% probability that my difference would have been caused by random chance, then there must be a 97% probability that it was caused by a real difference. But this is wrong!
What you can say is that if the null hypothesis were true, then 97% of experiments would lead to a difference smaller than the one you observed, and 3% of experiments would lead to a difference as large or larger than the one you observed.
Calculation of a P value is predicated on the assumption that the null hypothesis is correct. P values cannot tell you whether this assumption is correct. P value tells you how rarely you would observe a difference as larger or larger than the one you observed if the null hypothesis were true.
The question that the scientist must answer is whether the result is so unlikely that the null hypothesis should be discarded.
If the P value is 0.03, it is tempting to think that this means there is a 97% chance of getting ‘similar’ results on a repeated experiment. Not so.
No. A high P value means that if the null hypothesis were true, it would not be surprising to observe the treatment effect seen in this experiment. But that does not prove the null hypothesis is true.
You reject the null hypothesis (and deem the results statistically significant) when a P value from a particular experiment is less than the significance level α, which you (should have) set as part of the experimental design. So if the null hypothesis is true, α is the probability of rejecting the null hypothesis.
The P value and α are not the same. A P value is computed from each comparison, and is a measure of the strength of evidence. The significance level α is set once as part of the experimental design.
1. RB Kline, Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research, 2004, IBSN:1591471184