Please enable JavaScript to view this site.

What is ANOVA?

ANalysis Of VAriance (ANOVA) is a statistical technique that is used to compare the means of three or more groups.

The ordinary one-way ANOVA (sometimes called a one-factor ANOVA) is used when the groups being compared can be defined by a single grouping factor, and the values in each group aren't repeated or matched in other groups. For example, you may want to compare a control group with a drug-treated group and a group treated with both drug and antagonist. Alternatively, you may want to compare five groups, each given a different drug. For this test, you would take measurements from participants that are assigned to only one group (and not matched in any way to participants in other groups). For each of these cases, there are assumptions about the distributions from which the data were sampled, and Prism is able to analyze data sampled either from a normal (Gaussian) distribution or a lognormal distribution. Specifying which distribution the analysis should assume changes the way some values are calculated, but generally the interpretation of the results are similar.

Why “ordinary”? That is a statistical term meaning that the data are not paired or matched. Analysis of paired or matched data uses “repeated measures” or “mixed model” ANOVA.

Why “one-way”? Because the values are categorized in one way, or by one factor. In this example, the factor is drug treatment. A two-way design would group the values by two grouping factors. For example, each of the three treatments (drug treatment) could be tested in both men and women (sex).

Why “variance”? Variance is a way to quantify variation. ANOVA works by comparing (“analysis of”) the variation within the groups with the variation among group means. For a single set of values sampled from a normal distribution, the variance is the square of the standard deviation. Later, we’ll see that this is equal to the sum of squares of the values divided by the degrees of freedom.

How ANOVA works

ANOVA works by comparing variation within groups to variation among group means. These descriptions apply equally to data sampled from a normal distribution and data sampled from a lognormal distribution with one caveat: to apply these steps and descriptions to data sampled from a lognormal distribution, the data should be log-transformed first. There's no need to perform this transformation manually (in Prism or before-hand). When indicating in the experimental design of the analysis that a lognormal distribution is assumed, Prism will handle the necessary transformations behind the scenes.

How lognormal ANOVA differs

Interpretation of one-way ANOVA requires the assumption that all values were sampled from a normal (Gaussian) distribution. Starting in Prism 10.5, Prism provides the option to perform a lognormal one-way ANOVA. This analysis is related to the common one-way ANOVA, but assumes that the data being analyzed were sampled from a lognormal distribution rather than a normal distribution. Since zero and negative values are impossible in lognormal distributions, the presence of these values in any analyzed dataset precludes the possibility of performing a lognormal ANOVA.

The lognormal ANOVA works by first transforming all values to their logarithms. The ANOVA is then performed on this transformed data, exactly as described on this page. Prism performs this transformation automatically (there's no need for you to transform your data in Prism or in another application). The results explained on the remainder of this page can be interpreted for a lognormal ANOVA in the same way that they're interpreted for a common ANOVA. But you should be very aware of the fact that the analysis was performed on the logarithms of the data!

When performing multiple comparisons tests as part of a lognormal ANOVA, the results of these multiple comparisons are transfomred back to the original scale of the data (again, this is done automatically). Instead of showing the difference between the mean of logarithms as part of the multiple comparisons results, Prism provides the geometric means, the ratio of these geometric means, and the 95% confidence interval of the ratio.

You can learn more about lognormal distributions and lognormal ANOVA from an in-depth review on this topic: HJ Motulsky, T Head, PBS Clarke, 2025, Analyzing Lognormal Data: A Nonmathematical Practical Guide. In press. Pharmacological Reviews

 

Key points for performing and interpreting lognormal ANOVA:

The geometric mean of a set of values is equal to the mean of the log-transformed values

Calculating the ratio of two geometric means is mathematically equivalent to calculating the difference between the log-transformed geometric means

Sums of squares for lognormal ANOVA are calcualted using the log-transformed data (and thus the means of the log-transformed data)

Prism converts the results of the analysis back to the original scale of the data automatically, so that there's no need for you to perform any other data manipulations from the results

Sum-of-squares

The first step in ANOVA is to calculate and partition the sum of squares in the data into three parts:

1.Total sum of squares. This is the sum of squared differences between each value to the grand mean of all of the data. Sometimes called SST

2.Within group sum of squares. First calculate the sum of the squared differences between each value and the mean of its group. Then sum those values (for all groups). This is referred to as the "within columns" sum of squares, and is sometimes called the sum of squared errors (SSE) or sum of squares within (SSW)

3.Between group sum of squares. For each group, calculate the square of the difference between the group mean and the grand mean of the data. Then multiply those values by the sample size of the corresponding group. Then add these values together. This is referred to as the "between columns" sum of squares, and is sometimes called the sum of squares of the regression (SSR) or sum of squares between (SSB)

Not surprisingly, the sum-of-squares within the groups and the sum-of-squares between the groups adds up to equal the total sum-of-squares.

Another way to think about this is that the between group sum of squares represents the variability caused by the treatment, while the within group sum of squares is the general variability you would expect to see within a sample of different individuals.

Mean squares

Each of these sum of squares values is associated with a certain number of degrees of freedom (df, computed from the number of subjects and number of groups). The mean square (MS) is calculated by dividing each sum of squares by the associated degrees of freedom. These values can be thought of as variances (similar to the definition above where variance is the square of the standard deviation). Unlike the sum-of-squares values, the mean-square within the groups and the mean-square between the groups does not add up to equal the total mean-square (which is rarely calculated).

The null hypothesis

To understand the P value (see below), you first need to articulate the null hypothesis.

For one-way ANOVA performed on data sampled from a normal distribution, the null hypothesis is that the populations the values from which the values were sampled all have the same mean. Furthermore, the analysis assumes that the variances (standard deviations) of those populations are equal.

For one-way ANOVA performed on data sampled from a lognormal distribution, the null hypothesis is that the populations from which the values were sampled all have the same geometric mean. Furthermore, the analysis assumes that the geometric standard deviations of those populations are equal.

The F statistic (F ratio)

The F statistic for ANOVA is the ratio of the mean square for the between groups divided by the mean square within groups.

If the null hypothesis is true, you would expect that the variance between groups would be roughly the same as the variance within groups. Another way of saying this is that if the null hypothesis is true, you would expect the F statistic to be close to 1.0 (the between group variance would be roughly the same as the within group variance). On the other hand, if the group assignment (in this example, the drug treatment) truly had an effect on the measurements, then you would expect to have a greater between group variance than within group variance. Subsequently, you would expect to have an F statistic that is greater than 1.0.

P value

The P value is determined from the F ratio, taking into account the number of values and the number of groups.

One-way ANOVA

Recall that the null hypothesis for a one-way ANOVA is that all population means are the same for data sampled from a normal distribution. The P value answers the following question:

If the null hypothesis is true (all groups are sampled from distributions or populations that have the same mean) what is the probability of observing an F ratio as large or larger than what you calculated due to random sampling variability alone?

If the overall P value is large, the data do not give you any reason to conclude that the means of those populations differ. Even if they were equal, you would not be surprised to find sample means this far apart just by chance. This is not the same as saying that the true means are the same. You just don't have compelling evidence that they differ.

For data sampled from lognormally distributed populations: if the overall P value is large, the data do not give you any reason to conclude that the geometric means of the popluations differ. Even if they were equal, you would not be surprised to find sample geometric means with a ratio as extreme as this just by chance. This is not the same as saying that the true geometric means are the same. You just don't have compelling evidence that they differ.

If the overall P value is tiny: You conclude that the population means from which the data were sampled are unlikely to be equal. This does not imply that every mean is different from every other mean. It only indicates that at least one probably differs from the rest. Look at the results of multiple comparison follow up tests to identify where the discrepancies may be.

Of course, these conclusions are tentative and random sampling can lead to errors in both directions.

Lognormal one-way ANOVA

Recall that the null hypothesis for lognormal one-way ANOVA is that all population geometric means are all the same.

The P value answers the following question:

If the null hypothesis is true,  what is the probability of observing an F ratio as large or larger than what you calculated due to random sampling variability alone?

For data sampled from lognormally distributed populations: if the overall P value is large, the data do not give you any reason to conclude that the geometric means of the populations differ. Even if they were equal, you would not be surprised to find sample geometric means with a ratio as extreme as this just by chance. This is not the same as saying that the true geometric means are the same. You just don't have compelling evidence that they differ.

If the overall P value is tiny: You conclude that the geometric means of the populations from which the data were sampled are unlikely to be equal. This doesn't imply that every geometric mean is different from every other geometric mean. It only indicates that at least one probably differs from the rest. Look at the results of multiple comparison follow up tests to identify where the discrepancies may be.

Of course, these conclusions are tentative and random sampling can lead to errors in both directions.

Tests for equal variances

When analyzing data sampled from a normally distributed population, ANOVA is based on the assumption that the populations from which the data are sampled all have the same variance. This is equivalent to saying that they have the same standard deviation since variance is the square of standard deviation.

For data sampled from a lognormally distributed population, it's a bit more complex. In this case, the assumption is that the variance of the log-transformed populations are equal. This is equivalent to saying that the geometric standard deviations of the two populations are the same, but NOT the same as saying that the variances of the two populations are the same (two lognormally distributed populations can have the same geometric standard deviation and different variances).

Prism tests this assumption with two tests. It computes the Brown-Forsythe test and also (if every group has at least five values) computes Bartlett's test. There are no options for whether to run these tests. Prism automatically does so and always reports the results.

Both these tests compute a P value designed to answer the following question:

For data sampled from normal populations: If the populations really have the same variances (or standard deviations), what is the probability that the samples would have variances as dissimilar (or more dissimilar) as what you observed in your samples due to random sampling variability alone?

For data sampled from lognormal populations: If the log-transformed populations really have the same variances (if the populations have the same geometric standard deviations), what is the probability that the log-transformed samples would have variances as dissimilar (or more dissimilar) as what you observed in your samples due to random sampling variability

Don’t mix up these P values testing for equal variances with the P value testing equality of the means.

Bartlett's test

Prism reports the results of the "corrected" Barlett's test as explained in section 10.6 of Zar(1). Bartlett's test works great if the data really are sampled from Gaussian distributions (equivalently if data were sampled from a lognormal distribution and log-transformed prior to running this test; which Prism does). But if the distributions deviate even slightly from this ideal, Bartett's test may report a small P value even when the differences among variances are trivial. For this reason, many do not recommend that test. That's why we added the test of Brown and Forsythe.  It has the same goal as the Bartlett's test, but is less sensitive to minor deviations from normality. We suggest that you pay attention to the Brown-Forsythe result, and ignore Bartlett's test (which we left in to be consistent with prior versions of Prism).

Brown-Forsythe test

The Brown-Forsythe test is conceptually simple. Each value* in the data table is transformed by subtracting from it the median of that column, and then taking the absolute value of that difference. One-way ANOVA is run on these values, and the P value from that ANOVA is reported as the result of the Brown-Forsythe test.

How does it work? By subtracting the medians, any differences between medians have been subtracted away, so the only distinction between groups is their variability.

Why subtract the median and not the mean of each group?  If you subtract the column mean instead of the column median, the test is called the Levene test for equal variances. Which is better? If the distributions are not quite Gaussian, it depends on what the distributions are. Simulations from several groups of statisticians show that using the median works well with many types of non-Gaussian data. Prism only uses the median (Brown-Forsythe) and not the mean (Levene).

*For lognormal one-way ANOVA, the values are first log-transformed, and the the Brown-Forsythe test and Bartlett’s test are run on those logarithms.

Interpreting the results

If the P value from the test for equal variances is small, you must choose whether you will conclude that the standard deviations (for normal distributions) or geometric standard deviations (for lognormal distributions) of the populations are different. Obviously these tests are based only on the values in this one experiment. Think about data from other similar experiments before making a conclusion.

If you conclude that the variances are truly different, you have four choices:

Conclude that the populations are different. In many experimental contexts, determining that the distribution have different shapes (determined by the standard deviation for normal distributions and the geometric standard deviation for lognormal distributions) is as important as the finding that they have different locations (means for normal distributions or geometric means for lognormal distributions). If the shapes of the distributions are truly different, then the populations are different regardless of what ANOVA concludes about differences among the locations. This may be the most important conclusion from the experiment.

Make sure you're using the appropriate distribution assumption. If your data were sampled from a lognormal distribution, performing an ANOVA assuming a normal distribution will provide misleading results. For other transformations (such as reciprocals), you may try transforming the data to equalize the standard deviations, and then rerun the ANOVA.

Use the Welch or Brown-Forsythe versions of one-way ANOVA that do not assume equal variances.

Switch to the nonparametric Kruskal-Wallis test. The problem with this is that if your groups have very different standard deviations, it is difficult to interpret the results of the Kruskal-Wallis test. If the standard deviations are very different, then the shapes of the distributions are very different, and the Kruskal-Wallis results cannot be interpreted as comparing medians.

R squared

R2 is the fraction of the overall variance (of all the data, pooling all the groups) attributable to differences among the group means. It compares the variability among group means with the variability within the groups. A large value means that a large fraction of the variation is due to the treatment that defines the groups. The R2 value is calculated from the ANOVA table and equals the between group sum-of-squares divided by the total sum-of-squares. Some programs (and books) don't bother reporting this value. Others refer to it as η2 (eta squared) rather than R2. It is a descriptive statistic that quantifies the strength of the relationship between group membership and the variable you measured.

Multiple comparisons testing

For many scientists, the results of multiple comparisons testing are of farm more importance than the overall ANOVA results. Read about multiple comparisons testing following one-way ANOVA in Prism.

Reference

J.H. Zar, Biostatistical Analysis, Fifth edition 2010, ISBN:  0131008463.

© 1995-2019 GraphPad Software, LLC. All rights reserved.