When you view data in a publication or presentation, you may be tempted to draw conclusions about the statistical significance of differences between group means by looking at whether the error bars overlap. It turns out that examining whether or not error bars overlap tells you less than you might guess. However, there is one rule worth remembering:
When SEM bars for the two groups overlap, and the sample sizes are equal, you can be sure the difference between the two means is not statistically significant (P>0.05).
The opposite is not true. Observing that the top of one standard error (SE) bar is under the bottom of the other SE error bar does not let you conclude that the difference is statistically significant. The fact that two SE error bars do not overlap does not let you make any conclusion about statistical significance. The difference between the two means might be statistically significant or the difference might not be statistically significant. The fact that the error bars do not overlap doesn't help you distinguish the two possibilities.
If the error bars represent standard deviation rather than standard error, then no conclusion is possible. The difference between two means might be statistically significant or the difference might not be statistically significant. The fact that the SD error bars do or do not overlap doesn't help you distinguish between the two possibilities.
Error bars that show the 95% confidence interval (CI) are wider than SE error bars. It doesn’t help to observe that two 95% CI error bars overlap, as the difference between the two means may or may not be statistically significant.
Useful rule of thumb: If two 95% CI error bars do not overlap, and the sample sizes are nearly equal, the difference is statistically significant with a P value much less than 0.05 (Payton 2003).
With multiple comparisons following ANOVA, the significance level usually applies to the entire family of comparisons. With many comparisons, it takes a much larger difference to be declared "statistically significant". But the error bars are usually graphed (and calculated) individually for each treatment group, without regard to multiple comparisons. So the rule above regarding overlapping CI error bars does not apply in the context of multiple comparisons.
Type of error bar |
Conclusion if they overlap |
Conclusion if they don’t overlap |
SD |
No conclusion |
No conclusion |
SEM |
P > 0.05 |
No conclusion |
95% CI |
No conclusion |
P < 0.05 |
There are two ways to think about this. If what you really care about is statistical significance, then pay no attention to whether error bars overlap or not. But if what you really care about is the degree to which the two distributions overlap, pay little attention to P values and conclusions about statistical significance.
The rules of thumb listed above are true only when the sample sizes are equal, or nearly equal.
Here is an example where the rule of thumb about confidence intervals is not true (and sample sizes are very different).
Sample 1: Mean=0, SD=1, n=10
Sample 2: Mean=3, SD=10, n=100
The confidence intervals do not overlap, but the P value is high (0.35).
And here is an example where the rule of thumb about SE is not true (and sample sizes are very different).
Sample 1: Mean=0, SD=1, n=100, SEM=0.1
Sample 2: Mean 3, SD=10, n=10, SEM=3.33
The SEM error bars overlap, but the P value is tiny (0.005).