Features and functionality described on this page are available with our new Pro and Enterprise plans. Learn More... |
The Residuals tab allows you to check the assumptions of ANOVA by examining residuals and creating diagnostic plots. Residuals are the differences between observed values and the values predicted by your ANOVA model.
Checking residuals is important because ANOVA makes specific assumptions about your data. It assumes residuals are normally distributed and that variance is equal across groups (homoscedasticity). When these assumptions are violated, P values may not be accurate and your conclusions could be affected. Residual plots can also reveal outliers, data entry errors, or problems with how you've set up your model.
It's always generally a good idea to check residuals before interpreting your results. This is especially true prior to publication, when results seem unexpected, or when sample sizes are small (since ANOVA is less robust with limited data). With large, balanced samples, ANOVA is fairly robust to minor violations, but it's still good practice to verify that assumptions are reasonably met.
You can select which diagnostic plots Prism should create. Each plot helps you assess different aspects of your model assumptions:
This plot shows residuals (Y-axis) against predicted values (X-axis). In a good residual plot, you would expect to see points scattered randomly around zero with no obvious patterns or trends. The spread should be roughly equal across the entire range of predicted values, creating a cloud of points with no systematic structure. This random scatter suggests your model captures the relationships in the data well and that the assumptions are met.
Several problematic patterns can appear in residual plots. A fan shape where the spread increases or decreases with predicted values indicates heteroscedasticity - the variance isn't equal across groups. This often happens with biological data and may require transformation (like log or square root) to fix. A curved pattern where points form a U-shape or arc suggests your model is missing something important such as an interaction term. Finally, this graph may display outliers as one or a few points found far from the main cluster. These outliers could indicate data entry errors or legitimate but extreme values that deserve investigation.
This plot shows the absolute value of residuals against predicted values. It's particularly useful for detecting unequal variances that might be subtle in the regular residual plot.
What you want to see is points scattered randomly with no trend. The spread should be similar across all predicted values. If you see an upward trend where |residuals| increase with predicted values, it means larger groups have larger variance. This is common with count data or percentages, and often fixed with a log or square root transformation. A downward trend is less common but equally problematic. Distinct horizontal bands of points may indicate that different groups have very different variances, which could suggest data quality issues or that your groups are fundamentally different in their variability.
The QQ plot compares the distribution of your residuals to what you'd expect from a normal distribution. This plot shows predicted residuals (if they were perfectly normal) on the Y-axis and your actual residuals on the X-axis, along with a diagonal reference line.
If your residuals are approximately normal, points should fall close to the diagonal line. Minor deviations at the ends are acceptable and don't usually indicate a problem. With real data, you don't expect to observe perfect normality so some departures are to be expected. What matters is whether the overall pattern follows the line reasonably well.
An S-shaped curve where points swing away from the line at both ends indicates your data are skewed. Right-skewed data (common in biology) often benefit from a log transformation. Heavy tails where points curve away from the line at the extremes suggest you have more extreme values than a normal distribution would predict. This could indicate outliers or a distribution with fatter tails. Light tails show the opposite pattern and are usually less concerning. Finally, systematic deviation where points consistently sit above or below the line indicates your distribution isn't normal and may need transformation or an alternative analysis approach.
In addition to visual inspection of plots, Prism offers statistical tests for checking assumptions. These tests provide objective measures but should be interpreted alongside the plots, not in isolation.
This test calculates the correlation between predicted values and absolute residuals, testing whether variance increases (or decreases) systematically with the magnitude of values. It provides an objective alternative to visual inspection of the homoscedasticity plot. Under ideal conditions, there should be no correlation between the residual and the predicted value (the correlation coefficient would be equal or nearly zero). Thus, a large P value indicates that there isn't enough justification from the data to reject the null hypothesis that there is no correlation. Put simply, you're looking for a larger P value here.
This option runs four different normality tests on your residuals: the D'Agostino-Pearson test (which examines skewness and kurtosis), the Anderson-Darling test (which emphasizes the tails of the distribution), the Shapiro-Wilk test (generally the most powerful test for normality), and the Kolmogorov-Smirnov test (which tests overall distribution shape). Together, these tests evaluate whether your residuals are sampled from a normal distribution.
If all four tests give P values greater than 0.05, then the data do not provide enough justification to reject the null hypothesis of any of these tests. Since the null hypothesis for each of these tests is that the data are sampled from a normal distribution, you can assume with large P values that no substantial departures from normality were detected. In other words, this suggests the normality assumption is reasonable and you can proceed with interpreting your ANOVA results.
If some or all tests give P values less than 0.05, a significant departure from normality was detected. Before you panic, consider several factors. First, how severe is the departure? Check your QQ plot - if it looks reasonably close to the line, the deviation may not be practically important even if it's statistically significant. Second, how large is your sample size? With large samples (especially over 50-100 observations per group), ANOVA is quite robust to moderate departures from normality. Third, are outliers causing the problem? A few extreme values might make the tests significant even though the bulk of your data are fine. Check for data entry errors or measurement problems. Finally, would transformation help? Log transformation often improves normality for right-skewed biological data.
If your residual diagnostics reveal violations of assumptions, transforming your response variable may help. Transformation changes the scale of your measurements to make the distribution more normal and stabilize variance. Here are the most common transformations and when to use them:
Use log(Y) or log(Y+1) when you have right-skewed data with a long tail toward high values, when variance increases with the mean, or when your data span several orders of magnitude. This transformation is very common in biology for concentrations, gene expression levels, and cell counts. The log transformation compresses large values more than small ones, which often both normalizes the distribution and equalizes variance across groups.
One limitation: you can't take the log of zero or negative values. If your data include zeros, use log(Y+1) instead, which shifts all values up by one before taking the log. After transformation, re-run your analysis and check the residuals again to see if the transformation helped.
Use the square root transformation (sqrt(Y)) when you have moderate right skew, count data (like number of colonies, events, or cells), or when variance is proportional to the mean. This transformation is less drastic than log transformation but often works well for count data. It's particularly appropriate when your response variable follows a Poisson distribution, where the variance naturally increases with the mean.
The reciprocal transformation (1/Y) is used for severe right skew or for time and rate data. It's less common in biological research and can be difficult to interpret because it reverses the scale: large values become small and vice versa. Use this transformation cautiously and make sure the reversed scale still makes sense for your scientific question.
An important caveat about transformation: it changes both the scale and the interpretation of your results. With log transformation, you're now comparing geometric means instead of arithmetic means, and when you back-transform to the original scale, differences become ratios rather than absolute differences. Always report clearly that you used transformed data and explain how to interpret the results. For example, "Data were log-transformed before analysis. Results are presented as geometric means, and differences represent fold-changes."