Other multiple comparison tests

Print this Topic

Tests that Prism offers, but we don't recommend

Bonferroni test to compare every pair of means

Prism offers the Bonferroni test for comparing every pair of means, but its only advantage over Tukey's test is that it is much easier to understand how it works. Its disadvantage is that it is too conservative, so you are more apt to miss real differences (also confidence intervals are too wide). This is a minor concern when you compare only a few columns, but is a major problem when you have many columns. Don't use the Bonferroni test with more than five groups.

Newman-Keuls

The Newman-Keuls (also called Student-Newman-Keuls test) compares all pairs of means following one-way ANOVA. The Newman-Keuls test is popular, but is no longer recommended for three reasons:

The Newman-Keuls test does not maintain the family-wise error rate at the specified level. Most often, alpha is set to 5%. This is supposed to mean that the chance of making one or more Type I errors is 5%. In fact the Newman-Keuls test doesn't do this(1). In some cases, the chance of a Type I error can be greater than 5%.
You can't interpret a P value or significance level without stating a null hypothesis, but it is difficult to articulate exactly what null hypotheses the Newman-Keuls test actually tests.
Confidence intervals are more informative than significance levels, but the Newman-Keuls test cannot generate confidence intervals.

Although Prism still offers the Newman-Keuls test (for compatibility with prior versions), we recommend that you use the Tukey test instead. Unfortunately, the Tukey test has less power. This means that the Tukey test concludes that the difference between two groups is 'not statistically significant' in some cases where the Newman-Keuls test concludes that the difference is 'statistically significant'.

Tests Prism does not offer because many consider them obsolete

Fisher's LSD

While the Fisher's Least Significant Difference (LSD) test is of historical interest as the first post test ever developed, it is no longer recommended. The other tests are better. Prism does not offer the Fisher LSD test.

Fisher's LSD test does not correct for multiple comparisons as the other post tests do.

The other tests can be used even if the overall ANOVA yields a "not significant" conclusion. They set the 5% significance level for the entire family of comparisons -- so there is only a 5% chance than any one or more comparisons will be declared "significant" if the null hypothesis is true.

The Fishers LSD post test can only be used if the overall ANOVA has a P value less than 0.05. This first step sort of controls the false positive rate for the entire family of comparisons. But when doing each individual comparison, it sets the 5% significance level to apply to each individual comparison, rather than to the family of comparisons. This means it is easier to find statistical significance with the Fisher LSD test than with other post tests (it has more power), but that also means it is too easy to be mislead by false positives (you'll get bogus 'significant' results in more than 5% of experiments).

Duncan's test

This test is adapted from the Newman-Keuls method. Like the Newman-Keuls method, Duncan's test does not control family wise error rate at the specified alpha level. It has more power than the other post tests, but only because it doesn't control the error rate properly. Few statisticians, if any, recommend this test.

Multiple comparisons tests that Prism does not offer

Scheffe's test

Scheffe's test (not calculated by Prism) is used to do more all possible comparisons, including averages of groups. So you might compare the average of groups A and B with the average of groups C, D and E. Or compare group A, to the average of B-F. Because it is so versatile, Scheffe's test has less power to detect differences between pairs of groups, so should not be used when your goal is to compare one group mean with another.

Holm's test

Some statisticians highly recommend Holm's test. We don't offer it in Prism, because while it does a great job of deciding which group differences are statistically significant, it cannot compute confidence intervals for the differences between group means. (Let us know if you would like to see this in a future version of Prism.)

False Discovery Rate

The concept of the False Discovery Rate is a major advance in statistics. But it is really only useful when you have calculated a large number of P values from independent comparisons, and now have to decide which P values are small enough to followup further. It is not used as a post test following one-way ANOVA.

References                                                                                        

1. MA Seaman, JR Levin and RC Serlin, Psychological Bulletin 110:577-586, 1991.



Copyright (c) 2007 GraphPad Software Inc. All rights reserved.
URL: http://www.graphpad.com/help/Prism5/Prism5Help.html?whats_wrong_with_the_newman_ke.htm