Please enable JavaScript to view this site.

ANalysis Of VAriance (ANOVA) is a statistical technique that is used to compare the means of three or more groups.

The ordinary one-way ANOVA (sometimes called a one-factor ANOVA) is used when the groups being compared can be defined by a single grouping factor, and the subjects in each group aren't repeated or matched in other groups. For example, you may want to compare a control group with a drug-treated group and a group treated with both drug and antagonist. Alternatively, you may want to compare five groups, each given a different drug. For this test, you would take measurements from subjects that are assigned to only one group (and not matched in any way to subjects in other groups). In all cases, it is assumed that the populations from which the subjects or samples are collected follow a normal (Gaussian) distribution.

Why “ordinary” ? That is a statistical term meaning that the data are not paired or matched. Analysis of paired or matched data uses “repeated measures” or “mixed model” ANOVA.

Why “one-way”? Because the values are categorized in one way, or by one factor. In this example, the factor is drug treatment. A two-way design would group the values by two grouping factors. For example, each of the three treatments (drug treatment) could be tested in both men and women (sex).

Why “variance”? Variance is a way to quantify variation. ANOVA works by comparing (“analysis of”)  the variation within the groups with the variation among group means. For a single set of values, the variance is the square of the standard deviation. Later, we’ll see that this is equal to the sum of squares of the values divided by the degrees of freedom.

How ANOVA works

ANOVA works by comparing variation within groups to variation among group means.

Sum-of-squares

The first step in ANOVA is to calculate and partition the sum of squares in the data into three parts:

1.Total sum of squares. This is the sum of squared differences between each value to the grand mean of all of the data. Sometimes called SST

2.Within group sum of squares. First calculate the sum of the squared differences between each value and the mean of its group. Then sum those values (for all groups). This is referred to as the "within columns" sum of squares, and is sometimes called the sum of squared errors (SSE) or sum of squares within (SSW)

3.Between group sum of squares. For each group, calculate the square of the difference between the group mean and the grand mean of the data. Then multiply those values by the sample size of the corresponding group. Then add these values together. This is referred to as the "between columns" sum of squares, and is sometimes called the sum of squares of the regression (SSR) or sum of squares between (SSB)

Not surprisingly, the sum-of-squares within the groups and the sum-of-squares between the groups adds up to equal the total sum-of-squares.

Another way to think about this is that the between group sum of squares represents the variability caused by the treatment, while the within group sum of squares is the general variability you would expect to see within a sample of different individuals.

Mean squares

Each of these sum of squares values is associated with a certain number of degrees of freedom (df, computed from the number of subjects and number of groups). The mean square (MS) is calculated by dividing each sum of squares by the associated degrees of freedom. These values can be thought of as variances (similar to the definition above where variance is the square of the standard deviation). Unlike the sum-of-squares values, the mean-square within the groups and the mean-square between the groups does not add up to equal the total mean-square (which is rarely calculated).  

The null hypothesis

To understand the P value (see below), you first need to articulate the null hypothesis. For one-way ANOVA, the null hypothesis is that the populations or distributions the values are sampled from all have the same mean. Furthermore, ANOVA assumes those populations or distributions are Gaussian (normal) with equal standard deviations.

The F statistic (F ratio)

The F statistic for ANOVA is the ratio of the mean square for the between groups divided by the mean square within groups.

If the null hypothesis is true, you would expect that the variance between groups would be roughly the same as the variance within groups. Another way of saying this is that if the null hypothesis is true, you would expect the F statistic to be close to 1.0 (the between group variance would be roughly the same as the within group variance). On the other hand, if the group assignment (in this example, the drug treatment) truly had an effect on the measurements, then you would expect to have a greater between group variance than within group variance. Subsequently, you would expect to have an F statistic that is greater than 1.0.

P value

The P value is determined from the F ratio, taking into account the number of values and the number of groups. Recall that the null hypothesis for a one-way ANOVA is that all population means are the same. The P value answers the following question:

If the null hypothesis is true (all groups are sampled from distributions or populations that have the same mean) what is the probability of observing an F ratio as large or larger than what you calculated due to random sampling variability alone?

If the overall P value is large, the data do not give you any reason to conclude that the means of the populations that the samples were drawn from differ. Even if the population means were equal, you would not be surprised to find sample means this far apart just by chance. This is not the same as saying that the true means are the same. You just don't have compelling evidence that they differ.

If the overall P value is large: The data do not give you any reason to conclude that the means of the populations that the samples were drawn from differ. Even if the population means were equal, you would not be surprised to find sample means this far apart just by chance. This is not the same as saying that the true means are the same. You just don't have compelling evidence that they differ.

If the overall P value is tiny: You conclude that t the population means from which the data were sampled are unlikely to be equal. This doesn't mean that every mean differs from every other mean, only that at least one probably differs from the rest. Look at the results of multiple comparison follow up tests to identify where the differences are.

Of course, these conclusions are tentative and random sampling can lead to errors in both directions.

Tests for equal variances

ANOVA is based on the assumption that the data are sampled from populations that all have the same variance (this is equivalent to saying that they have the same standard deviation since variance is the square of standard deviation). Prism tests this assumption with two tests. It computes the Brown-Forsythe test and also (if every group has at least five values) computes Bartlett's test. There are no options for whether to run these tests. Prism automatically does so and always reports the results.

Both these tests compute a P value designed to answer this question:

If the populations really have the same variances (or standard deviations), what is the probability that the samples would have variances as dissimilar (or more dissimilar) as what you observed in your samples due to random sampling variability alone?

Don’t mix up these P values testing for equal variances with the P value testing equality of the means.

Bartlett's test

Prism reports the results of the "corrected" Barlett's test as explained in section 10.6 of Zar(1). Bartlett's test works great if the data really are sampled from Gaussian distributions. But if the distributions deviate even slightly from the Gaussian ideal, Bartett's test may report a small P value even when the differences among standard deviations are trivial. For this reason, many do not recommend that test. That's why we added the test of Brown and Forsythe.  It has the same goal as the Bartlett's test, but is less sensitive to minor deviations from normality. We suggest that you pay attention to the Brown-Forsythe result, and ignore Bartlett's test (which we left in to be consistent with prior versions of Prism).

Brown-Forsythe test

The Brown-Forsythe test is conceptually simple. Each value in the data table is transformed by subtracting from it the median of that column, and then taking the absolute value of that difference. One-way ANOVA is run on these values, and the P value from that ANOVA is reported as the result of the Brown-Forsythe test.

How does it work? By subtracting the medians, any differences between medians have been subtracted away, so the only distinction between groups is their variability.

Why subtract the median and not the mean of each group?  If you subtract the column mean instead of the column median, the test is called the Levene test for equal variances. Which is better? If the distributions are not quite Gaussian, it depends on what the distributions are. Simulations from several groups of statisticians show that using the median works well with many types of non-Gaussian data. Prism only uses the median (Brown-Forsythe) and not the mean (Levene).

Interpreting the results

If the P value from the test for equal variances is small, you must decide whether you will conclude that the standard deviations of the populations are different. Obviously the tests of equal variances are based only on the values in this one experiment. Think about data from other similar experiments before making a conclusion.

If you conclude that the populations have different variances, you have four choices:

Conclude that the populations are different. In many experimental contexts, the finding of different standard deviations is as important as the finding of different means. If the standard deviations are truly different, then the populations are different regardless of what ANOVA concludes about differences among the means. This may be the most important conclusion from the experiment.

Transform the data to equalize the standard deviations, and then rerun the ANOVA. Often you'll find that converting values to their reciprocals or logarithms will equalize the standard deviations and also make the distributions more Gaussian.

Use the Welch or Brown-Forsythe versions of one-way ANOVA that do not assume that all standard deviations are equal.

Switch to the nonparametric Kruskal-Wallis test. The problem with this is that if your groups have very different standard deviations, it is difficult to interpret the results of the Kruskal-Wallis test. If the standard deviations are very different, then the shapes of the distributions are very different, and the Kruskal-Wallis results cannot be interpreted as comparing medians.

R squared

R2 is the fraction of the overall variance (of all the data, pooling all the groups) attributable to differences among the group means. It compares the variability among group means with the variability within the groups. A large value means that a large fraction of the variation is due to the treatment that defines the groups. The R2 value is calculated from the ANOVA table and equals the between group sum-of-squares divided by the total sum-of-squares. Some programs (and books) don't bother reporting this value. Others refer to it as η2 (eta squared) rather than R2. It is a descriptive statistic that quantifies the strength of the relationship between group membership and the variable you measured.

Reference

J.H. Zar, Biostatistical Analysis, Fifth edition 2010, ISBN:  0131008463.

© 1995-2019 GraphPad Software, LLC. All rights reserved.