| Interpreting the unpaired t test
How the unpaired t test works
To calculate a P value for an unpaired t test, Prism first computes a t ratio. The t ratio is the difference between sample means divided by the standard error of the difference, calculated by pooling the SEMs of the two groups. If the difference is large compared to the SE of the difference, then the t ratio will be large (or a large negative number), and the P value is small. The sign of the t ratio tells you only which group had the larger mean. The P value is derived from the absolute value of t.
For the standard t test, the number of degrees of freedom (df) equals the total sample size minus 2. Welch's t test (a rarely used test which doesn't assume equal variances) calculates df from a complicated equation. Prism calculates the P value from t and df.
A standard t test assumes the two groups have equal variances. To test this assumption, Prism calculates the variance of each group (the variance equals the standard deviation squared) and then calculates F, which equals the larger variance divided by the smaller variance. The degrees of freedom for the numerator and denominator equal the sample sizes minus 1. From F and the two df values, Prism computes a P value that answers this question: If the two populations really have the same variance, what is the chance that you'd randomly select samples and end up with F as large (or larger) as observed in your experiment. If the P value is small, conclude that the variances (and thus the standard deviations) are significantly different.
Don't base your conclusion just on this one F test. Also consider data from other experiments in the series. If you conclude that the two populations really do have different variances, you have three choices:
- Conclude that the two populations are different - that the treatment had an effect. In many experimental contexts, the finding of different variances is as important as the finding of different means. If the variances are truly different, then the populations are different regardless of what the t test concludes about differences between the means. This may be the most important conclusion from the experiment.
- Transform the data to equalize the variances, then rerun the t test. You may find that converting values to their reciprocals or logarithms will equalize the variances and also make the distributions more Gaussian.
- Rerun the t test without assuming equal variances using Welch's modified t test.
How to think about results from an unpaired t test
The unpaired t test compares the means of two groups, assuming that data are sampled from Gaussian populations. The most important results are the P value and the confidence interval.
The P value answers this question: If the populations really have the same mean, what is the chance that random sampling would result in means as far apart (or more so) than observed in this experiment?
"Statistically significant" is not the same as "scientifically important". Before interpreting the P value or confidence interval, you should think about the size of the difference you are looking for. How large a difference would you consider to be scientifically important? How small a difference would you consider to be scientifically trivial? Use scientific judgment and common sense to answer these questions. Statistical calculations cannot help, as the answers depend on the context of the experiment.
You will interpret the results differently depending on whether the P value is small or large.
If the P value is small
If the P value is small, then it is unlikely that the difference you observed is due to a coincidence of random sampling. You can reject the idea that the difference is a coincidence, and conclude instead that the populations have different means. The difference is statistically significant, but is it scientifically important? The confidence interval helps you decide.
Because of random variation, the difference between the group means in this experiment is unlikely to equal the true difference between population means. There is no way to know what that true difference is. Prism presents the uncertainty as a 95% confidence interval. You can be 95% sure that this interval contains the true difference between the two means.
To interpret the results in a scientific context, look at both ends of the confidence interval and ask whether they represent a difference between means that would be scientifically important or scientifically trivial.
| Lower confidence limit |
Upper confidence limit |
Conclusion |
| Trivial difference |
Trivial difference |
Although the true difference is not zero (since the P value is low) the true difference between means is tiny and uninteresting. The treatment had an effect, but a small one. |
| Trivial difference |
Important difference |
Since the confidence interval ranges from a difference that you think would be biologically trivial to one you think would be important, you can't reach a strong conclusion. You can conclude that the means are different, but you don't know whether the size of that difference is scientifically trivial or important. You'll need more data to obtain a clear conclusion. |
| Important difference |
Important difference |
Since even the low end of the confidence interval represents a difference large enough to be considered biologically important, you can conclude that there is a difference between treatment means and that the difference is large enough to be scientifically relevant. |
|
|
|
If the P value is large
If the P value is large, the data do not give you any reason to conclude that the overall means differ. Even if the true means were equal, you would not be surprised to find means this far apart just by coincidence. This is not the same as saying that the true means are the same. You just don't have convincing evidence that they differ.
How large could the true difference really be? Because of random variation, the difference between the group means in this experiment is unlikely to be equal to the true difference between population means. There is no way to know what that true difference is. Prism presents the uncertainty as a 95% confidence interval. You can be 95% sure that this interval contains the true difference between the two means. When the P value is larger than 0.05, the 95% confidence interval will start with a negative number (representing a decrease) and go up to a positive number (representing an increase).
To interpret the results in a scientific context, look at both ends of the confidence interval and ask whether they represent a difference between means that would be scientifically important or scientifically trivial.
| Lower confidence limit |
Upper confidence limit |
Conclusion |
| Trivial decrease |
Trivial increase |
You can reach a crisp conclusion. Either the means really are the same or they differ by a trivial amount. At most, the true difference between means is tiny and uninteresting. |
| Trivial decrease |
Large increase |
You can't reach a strong conclusion. The data are consistent with the treatment causing a trivial decrease, no change, or an increase that might be large enough to be important. To reach a clear conclusion, you need to repeat the experiment with more subjects. |
| Large decrease |
Trivial increase |
You can't reach a strong conclusion. The data are consistent with a trivial increase, no change, or a decrease that may be large enough to be important. You can't make a clear conclusion without repeating the experiment with more subjects. |
| Large decrease |
Large increase |
You can't conclude anything until you repeat the experiment with more subjects. |
Checklist. Is an unpaired t test the right test for these data?
Before accepting the results of any statistical test, first think carefully about whether you chose an appropriate test. Before accepting results from an unpaired t test, ask yourself the questions below. Prism can help you answer the first two questions. You'll have to answer the others based on experimental design.
| Question |
Discussion |
| Are the populations distributed according to a Gaussian distribution? |
The unpaired t test assumes that you have sampled your data from populations that follow a Gaussian distribution. While this assumption is not too important with large samples, it is important with small sample sizes (especially with unequal sample sizes). Prism tests for violations of this assumption, but normality tests have limited utility. See The results of normality tests. If your data do not come from Gaussian distributions, you have three options. Your best option is to transform the values to make the distributions more Gaussian. Another choice is to use the Mann-Whitney nonparametric test instead of the t test. A final option is to use the t test anyway, knowing that the t test is fairly robust to violations of a Gaussian distribution with large samples. |
| Do the two populations have the same variances? |
The unpaired t test assumes that the two populations have the same variances (and thus the same standard deviation).
Prism tests for equality of variance with an F test. The P value from this test answers this question: If the two populations really have the same variance, what is the chance that you'd randomly select samples whose ratio of variances is as far from 1.0 (or further) as observed in your experiment. A small P value suggests that the variances are different.
Don't base your conclusion solely on the F test. Also think about data from other similar experiments. If you have plenty of previous data that convinces you that the variances are really equal, ignore the F test (unless the P value is really tiny) and interpret the t test results as usual.
In some contexts, finding that populations have different variances may be as important as finding different means.
|
| Are the data unpaired? |
The unpaired t test works by comparing the difference between means with the pooled standard deviations of the two groups. If the data are paired or matched, then you should choose a paired t test instead. If the pairing is effective in controlling for experimental variability, the paired t test will be more powerful than the unpaired test. |
| Are the "errors" independent? |
The term "error" refers to the difference between each value and the group mean. The results of a t test only make sense when the scatter is random - that whatever factor caused a value to be too high or too low affects only that one value. Prism cannot test this assumption. You must think about the experimental design. For example, the errors are not independent if you have six values in each group, but these were obtained from two animals in each group (in triplicate). In this case, some factor may cause all triplicates from one animal to be high or low. See The need for independent samples. |
| Are you comparing exactly two groups? |
Use the t test only to compare two groups. To compare three or more groups, use one-way ANOVA followed by post tests. It is not appropriate to perform several t tests, comparing two groups at a time. Making multiple comparisons increases the chance of finding a statistically significant difference by chance and makes it difficult to interpret P values and statements of statistical significance. |
| Do both columns contain data? |
If you want to compare a single set of experimental data with a theoretical value (perhaps 100%) don't fill a column with that theoretical value and perform an unpaired t test. Instead, use a one-sample t test. See One-sample t test. |
| Do you really want to compare means? |
The unpaired t test compares the means of two groups. It is possible to have a tiny P value - clear evidence that the population means are different even if the two distributions overlap considerably. In some situations - for example, assessing the usefulness of a diagnostic test you may be more interested in the overlap of the distributions than in differences between means. |
| If you chose a one-tail P value, did you predict correctly? |
If you chose a one-tail P value, you should have predicted which group would have the larger mean before collecting any data. Prism does not ask you to record this prediction, but assumes that it is correct. If your prediction was wrong, then ignore the P value reported by Prism and state that P>0.50. See One- vs. two-tail P values. |
|