Please enable JavaScript to view this site.

Two options for unpaired tests

When you choose to compare the means (or geometric means) of two unpaired groups with a t test, you have two choices:

Use the standard unpaired test. It assumes that both groups of data are sampled from population distributions with the same variance (standard deviation or geometric standard deviation).

Use the unequal variance test, also called the Welch test. It does not assume that data were sampled from populations with the same variance (standard deviation or geometric standard deviation).

A potentially strange null hypothesis, but a useful test

To interpret any P value, it is essential that the null hypothesis be carefully defined. For the Welch (unequal variance) t test, the null hypothesis is that the two populations have the same mean (for data sampled from normal distributions) or geometric means (for data sampled from lognormal distributions). However, the variances of these two populations may differ.

If the P value is large, you don't reject the null hypothesis. In other words, the evidence does not persuade you that the two population means (or geometric means) are different, even though you assume the two populations may have different standard deviations. What a strange set of assumptions. What would it mean for two populations to have the same mean but different standard deviations? Why would you want to test for that? While this scenario may not be overly common in science (1), there is still good reason to consider using the Welch test.

The Welch test is recommended to be the default test unless there is a compelling reason to use an equal variance test. Why? When the variances of the populations being sampled are truly equal, the Welch test performs nearly as well as the equivalent equal variance test (the Welch test has a minimal loss of power compared to the equal variances test). However, the Welch test performs much better when the variances of the sampled populations truly are different, exhibiting a higher power and maintaining an appropriate type I error rate (alpha) compared to the equal variance test.

How the unequal variance t test is computed

Both the Welch and equal variance tests report both a P value and confidence interval. The calculations differ in two ways:

Calculation of the standard error of the difference between means

The t ratio is computed by first determining the the difference between the two sample means (for data sampled from normal distributions) or the difference of the logarithm of the two sample geometric means (for data sampled from lognormal distributions). This value is then divided by the standard error of the difference. This standard error is computed from the variances and sample sizes of the two groups. When the two groups have the same sample size, the standard error is identical for the two t tests. But when the two groups have different sample sizes, the t ratio for the Welch t test is different than for the ordinary t test. This standard error of the difference is also used to compute the confidence interval for the difference between the two means.

Calculation of the df

For the equal variance unpaired t test, df is computed as the total sample size (both groups) minus two. The df for the Welch test is computed by a complicated formula that takes into account the discrepancy between the two variances. If the two samples have identical variances, the df for the Welch t test will be identical to the df for the standard t test. In most cases, however, the two variances are not identical and the df for the Welch t test will be smaller than it would be for the unpaired t test. The calculation usually leads to a df value that is not an integer. Prism reports and uses this fractional value for df. Many programs, including Prism 5, as well as  InStat  and our QuickCalc all round the df down to next lower integer. For this reason, the P value reported by Prism can be a bit smaller than the P values reported by other programs.

When to chose the unequal variance (Welch) t test

Deciding when to use the unequal variance t test is not straightforward.

It seems sensible to first test whether the variances are different, and then choose the ordinary or Welch t test accordingly. In fact, this is not a good plan. You should decide to use this test as part of the experimental planning.

What about always choosing the Welch test? Ruxton (2) and Delacre (3) make a strong case that this is a good idea. You lose some power when the standard deviations are, in fact, equal but gain power in the cases where they are not.

Reference

1. S.S. Sawilowsky.  Fermat, Schubert, Einstein, and Behrens-Fisher: The Probable Difference Between Two Means With Different Variances. J. Modern Applied Statistical Methods (2002) vol. 1 pp. 461-472

2. Ruxton. The unequal variance t-test is an underused alternative to Student's t-test and the Mann-Whitney U test. Behavioral Ecology (2006) vol. 17 (4) pp. 688

3. Delacre, M., Lakens, D.L., and Leys, C. (2017). Why Psychologists Should by Default Use Welch's t-test Instead of Student's t-test. Rips 30: 92–10.

© 1995-2019 GraphPad Software, LLC. All rights reserved.