Viewing By Month : October 2008 / Main
October 20, 2008

Is it better to plot graphs with SD or SEM error bars?

Neither!

There are better alternatives, depending on your goal. 

If you want to show the variation in your data:

If each value represents a different individual, you probably want to show the variation among values. Even if each value represents a different lab experiment, it often makes sense to show the variation. 

With fewer than 100 or so values, create a scatter plot that shows every value. What better way to show the variation among values than to show every value? If your data set has more than 100 or so values, a scatter plot becomes messy. Alternatives are to show a box-and-whiskers plot, a frequency distribution (histogram), or a cumulative frequency distribution.

What about plotting mean and SD? The SD does quantify variability, so this is indeed one way to graph variability. But a SD is only one value, so is a pretty limited way to show variation. A graph showing mean and SD error bar is less informative than any of the other alternatives, but takes no less space and is no easier to interpret. I see no advantage to plotting a mean and SD rather than a column scatter graph, box-and-wiskers plot, or a frequency distribution.

Of course, if you do decide to show SD error bars, be sure to say so in the figure legend so no one will think it is a SEM.

If you want to show how precisely you have determined the mean:

If your goal is to compare means with a t test or ANOVA, or to show how closely our data come to the predictions of a model,  you may be more interested in showing how precisely the data define the mean than in showing the variability. In this case, the best approach is to plot the 95% confidence interval of the mean (or perhaps a 90% or 99% confidence interval).

What about the standard error of the mean (SEM)? Graphing the mean with an SEM error bars is a commonly used method to show how well you know the mean,  The only advantage of SEM error bars are that they are shorter, but SEM error bars are harder to interpret than a  confidence interval.

Whatever error bars you choose to show, be sure to state your choice.

If you want to create persuasive propoganda: 

If your goal is to emphasize small and unimportant differences in your data, show your error bars as SEM,  and hope that your readers think they are SD

If our goal is to cover-up large differences, show the error bars as the standard deviations for the groups, and hope that your readers think they are a standard errors.

This approach was advocated by Steve Simon in his excellent weblog. Of course he meant it as a joke. If you don't understand the joke, review  the differences between SD and SEM.

 

The confidence interval of a standard deviation.

The SD of a sample is not the same as the SD of the population

It is straightforward to calculate the standard deviation from a sample of values. But how accurate is that standard deviation? Just by chance you may have happened to obtain data that are closely bunched together, making the SD low. Or you may have randomly obtained values that are far more scattered than the overall population, making the SD high. The SD of your sample does not equal, and may be quite far from, the SD of the population.

Confidence intervals are not just for means

You are probably already familiar with a confidence interval of a mean. The idea of a confidence interval is very general, and you can express the precision of any computed value as a 95% confidence interval (CI). Another example is a confidence interval of a best-fit value from regression, for example a confidence interval of a slope.

The 95% CI of the SD

The SD is just a value you compute from data. It's not done often, but it is certainly possible to compute a CI for a SD. A free GraphPad QuickCalc does the work for you. 

Interpreting the CI of the SD is straightforward. If you assume that your data were randomly and independently sampled from a Gaussian distribution, you can be 95% sure that the CI computed from the sample SD contains the true population SD.

How wide is the CI of the SD? Of course the answer depends on sample size (N). With small samples, the interval is quite wide as shown in the table below.

N

95% CI of SD

2

0.45*SD to 31.9*SD

3

0.52*SD to 6.29*SD

5

0.60*SD to 2.87*SD

10

0.69*SD to 1.83*SD

25

0.78*SD to 1.39*SD

50

0.84*SD to 1.25*SD

100

0.88*SD to 1.16*SD

500

0.94*SD to 1.07*SD

1000

0.96*SD to 1.05*SD

Example

 

The sample standard deviation computed from the five values shown in the graph above is 18.0. But the true standard deviation of the population from which the values were sampled might be quite different. From the n=5 row of the table, the 95% confidence interval extends from 0.60 times the SD to 2.87 times the SD. Thus the 95% confidence interval ranges from  0.60*18.0 to 2.87*18.0,  from 10.8 to 51.7. When you compute a SD from only five values, the upper 95% confidence limit for the SD is almost five times the lower limit.

Most people are surprised that small samples define the SD so poorly. Random sampling can have a huge impact with small data sets, resulting in a calculated standard deviation quite far from the true population standard deviation.

Note that the confidence intervals are not symmetrical. Why? Since the SD is always a positive number, the lower confidence limit can't be less than zero. This means that the upper confidence interval usually extends further above the sample SD than the lower limit extends below the sample SD. With small samples, this asymmetry is quite noticeable.

Computing the Ci of a SD with Excel

These Excel equations compute the confidence interval of a SD. N is sample size; alpha is 0.05 for 95% confidence, 0.01 for 99% confidence, etc.:

Lower limit: =SD*SQRT((N-1)/CHIINV((alpha/2), N-1))

Upper limit: =SD*SQRT((N-1)/CHIINV(1-(alpha/2), N-1))

 

Statistics with n=2

A first step towards analyzing data is often to compute the SD, SEM and confidence interval of the mean. It seems to be common lab folklore that these calculations are not valid for n=2. This page explains that this folklore is wrong.

With only two values, there really is not much point in displaying a mean with SD or SEM, as you can display the actual data in the same amount of space. In fact, there are better alternatives to plotting either the SD or the SEM. But if you do want to show a SD or SEM, the equations that calculate the SD, SEM and CI all work just fine when you have only duplicate (N=2) data.

Are the results valid? It is known that the sample SD computed from small samples underestimates, on average, the true population SD. But  the discrepancy is small compared to random variability inherent in collecting tiny data sets.

The discrepancy only applies to the SD. The variance, which is the SD squared, is unbiased even for n=2. 

To prove the validity of n=2 calculations, I simulated five thousand data sets with n=2,  with each value randomly chosen from a Gaussian distribution (GraphPad QuickCalcs can do this, as can Excel). First I computed the 95% confidence intervals for each data set and asked whether the interval included the true value. When analyzing data,  you can't answer this question. But here the data are simulated from a known population, so we know what the true population mean is. In  95.02% of these simulations, the confidence interval of the mean included the true population mean. So a confidence interval of a mean computed from a n=2 sample can be interpreted as it usually is. The only problem with having only duplicate data, is that the confidence interval is so very wide.

Using the simulated data, I started to ask whether the calculated sample SD was a good estimate of the true SD.  But it is known that the sample SD, on average, is too small when n is small. That doesn't really matter, since all statistical tests (t test, ANOVA) are actually based on the variance (the square of the SD). For these reasons, I used simulations to ask whether the sample variance from a n=2 sample is unbiased. For each of the 10,000 simulated data sets I computed the variance from the two values. The average of these 10,000 variances was within 1% of the true variance from which the data were simulated. This shows that the variance computed from n=2 data is a valid assessment of the scatter in your data, no less valid than a SD computed from data with larger n.

The SD computed from tiny samples underestimate the population SD (but not by much)  

The standard deviation (SD) quantifies scatter. The equation used to compute the sample SD (which uses n-1 in the denominator), underestimates the true population SD by a small amount.

The following simulation demonstrates this. The graph shows the results of 400 simulations. Each simulation randomly sampled from a Gaussian population with mean=100 and SD=15. One hundred samples had only duplicate values (n=2, left panel of graph). Another 100 had n=3, n=10 and n=50. Each dot on the graph shows the SD of one randomly generated sample. The long horizontal line shows the true population SD, which is 15.0. The shorter horizontal lines show the mean of the SDs from the 100 simulated samples for each sample size.  

You can see that the mean SD is a bit too low for the n=2 and n=3 samples. This is not just a glitch due to random sampling, but rather is a consistent finding. The SD is the square root of variance. The equation that computes variance (with N-1 in the denominator) is correct. The average of the variances of these simulated samples are indeed very close to the true population variance. Taking the square root of the variances to compute the SD reduces the large variances more than the small, and the mean of the SDs underestimates the true population SD

An unbiased estimate of the population SD equals the computed sample SD divided by a quantity known as c4 (The c, I think, is for control chart; I don't know why it is called c4). The value of c4, of course, depends on sample size. it is computed with this Excel formula:

=EXP(GAMMALN(N/2)+LN(SQRT(2/(N-1)))-GAMMALN((N-1)/2))

With n=2, the computed SD is too low by about 20%. With n=10, the discrepancy is only about 3%. Other values are tabulated below:

 

n C4
2 0.79788
3 0.88623
4 0.92132
5 0.93999
6 0.95153
7 0.95937
8 0.96503
9 0.96931
10 0.97266
15 0.98232
20 0.98693
30 0.99142
50 0.99491
100 0.99748

 

Prism and InStat compute the sample standard deviation without the correction detailed above.  They don't even offer the option of including the c4 correction. Few programs do. Why is this correction commonly ignored?

  • If you really care about differences between means, then what matters is the variance. The t test and one-way ANOVA use variances in their internal calculation. The square of the sample SD (without the c4 correction) is the best estimate of the population variance.
  • Inferences based on the confidence interval of the mean are also correct when the sample SD is used (the theory is based onthe variance not the standard deviation).
  • The correction is tiny unless the samples are really small. Even then, as the graph above shows, the systematic deviation of the sample SD from the population SD is tiny compared to the random variation. 
  • Tradition. A new definition of SD would be confusing. It is confusing enough to have two definitions (n vs. n-1).

 

October 16, 2008

 How does Prism compute the % of total variation in two-way ANOVA?

 As part of two-way ANOVA, Prism reports the % of total variation accounted for by the interaction, the column factor and the row factor. These values are computed by dividing the sum-of-squares from the ANOVA table by the total sum-of-squares. The three values do not total 100% because Prism does not report the % of total variation accounted for by the residual (or error) part of the ANOVA table. If that were included too, the percentages would add to 100. 

These values (% of total variation) are called standard omega squared by Sheskin (equations 27.51 - 27.53,  and R2 by Maxwell and Delaney (page 295). Others call eta squared or the correlation ratio.

Prism simply reports how the total sum of squares is partitioned into the various components in your particular sample of data. Like R2 in linear regression, this simply is a description of your data and not a best-guess of a parameter in the population. It is possible to compute the best-guess for the population value. This is called omega squared (distinguish from the standard omega squared), but Prism does not compute it.

 

Handbook of Parametric and Nonparametric Statistical Procedures, Third Edition Handbook of Parametric and Nonparametric Statistical Procedures, Third Edition
by David J. Sheskin
IBSN:1584884401. List price:$139.95
Buy from amazon.com for $139.95

  Designing Experiments and Analyzing Data: A Model Comparison Perspective, Second Edition
by Scott E. Maxwell
IBSN:0805837183. List price:$95.00
Buy from amazon.com for $75.53

October 6, 2008

 Can Prism run on the tiny 'netbook' computers? 

Until the release of Prism 5.02, Prism required video resolution with at least 768 pixels (XGA). With Prism 5.02, we have tweaked Prism so it requires only 600 vertical pixels. This means that Prism will run on many of the new tiny netbook or subnotebook computers. 

Write to support at graphpad dot com if you want to beta test Prism 5.02. Include your Prism 5 Windows serial number, as only current Prism 5 owners may beta test. 

Notes:

  • Some tiny computers have only 480 pixels of vertical resolution. Prism won't run on these. 
  • Some netbook computers run LInux, rather than Windows. Prism only runs under Windows. 
  • At this time, Apple has not created a tiny computer. When they do, we'll do our best to make Prism run on it. 

 Are variables such as pH or log(EC50) classified as interval or ratio?

 Many statistics books emphasize the difference between interval and ratio variables

  • interval variable is a measurement where the difference between two values is meaningful. The difference between a temperature of 100 degrees and 90 degrees is the same difference as between 90 degrees and 80 degrees.
  • ratio variable, has all the properties of an interval variable, and also has a clear definition of 0.0. When the variable equals 0.0, there is none of that variable. Variables like height, weight, enzyme activity are ratio variables. Temperature, expressed in F or C, is not a ratio variable. A temperature of 0.0 on either of those scales does not mean 'no temperature'. 

What about pH and EC50 (or logEC50) values? Acidity is measured by pH, which is the negative logarithm of the concentration of H+ ions. The mid-point of dose-response curves is quantified as the EC50, often expressed as its logarithm. (This midpoint is sometimes called the IC50 or ED50).

First let's look at the variables without a log transform, H+ and EC50. Both come pretty close to the definition of a ratio variable:

  • Does zero mean none? Depends on how you look at it. Zero is zero in an abstract way. But a value of zero can never be achieved with those variables. No aqueous solution can have zero hydrogen ions, and a drug cannot have an EC50 of zero. 
  • Does it make sense to compute differences between two values, or the mean and SD of a set of values? The calculations wouldn't be meaningless. But they aren't helpful either. 
  • Does it make sense to compute ratios of two values? Sure! 

What about the variables after a log transform to pH and logEC50?

  • Does zero mean none? Not at all, as you'd get a different zero if you changed the units used to measure concentration. A pH of zero does not mean no acidity! Or maximum acidity. It means that the concentration of H+ ions is 1 molar, an arbitrary value.
  • Does it make sense to compute differences between two values, or the mean and SD of a set of values? Yes. 
  • Does it make sense to compute ratios of two values? Not at all. Because zero is defined arbitrarily, ratios are entirely meaningless. A pH of 4 is double a pH of 2. A pH of 14 is double a pH of 7. The two doublings have nothing in common - computing that ratio is meaningless. 

Bottom line:

  • Expressed on a concentration scale as  H+ and EC50, these variables come close to meeting the criteria of a ratio variable. But it is a bit of a stretch.
  • Expressed on a log scale as pH and logEC50, these variables clearly are neither interval variables or ratio variables. They really are in a category of their own. 

This table summarizes how pH and EC50 (logged or not) compare to interval and ratio variables. 

OK to compute.... H+, EC50 pH, logEC50 Interval Ratio
frequency distribution. Yes Yes Yes Yes
median and percentiles. Yes
Yes Yes Yes
add or subtract. Borderline Borderline
Yes Yes
mean, standard deviation, standard error of the mean. Borderline
Yes
Yes Yes
ratio, or coefficient of variation. Yes
No No Yes