|
||||||||||||||
|
Viewing By Month : September 2008 / Main
September 24, 2008Why doesn't Prism compute R2 as part of Deming regression? Prism offers Deming linear regression, which fits a straight line when X, as well as Y, includes experimental error. In contrast, standard linear, and nonlinear, regression assumes that X is known precisely and all uncertainty (or scatter or variability) is in the Y variable. In Prism's Deming dialog, you specify whether X and Y are in the same units with equal uncertainties (variation). If you choose this option, Deming regression minimizes the sum of the squars of the perpendicular distances of the points from the line. This is also called orthogonal linear regression. When Prism performs Deming regression, it reports the slope and intercepts with confidence intervals, and reports a P value testing the null hypothesis that the slope is really zero. Prism does not report any measure of goodness-of-fit with Deming regression, and so does not report R2 value. The reason is that we have been unable to find any paper or text that would explain how to compute or interpret such a value. In ordinary linear or nonlinear regression, R2 is the fraction of the variation that is accounted for by the model. But with Deming regression, this definition doesn't really make sense, and it isn't obvious to us how to extend it. Please write to us if you know how to do this.
September 18, 2008Fitting bacterial growth data to determine the MIC and NIC. The MIC is the Minimum Inhibitory Concentration -- the smallest concentration of an antibiotic that 'completely' retards bacterial growth. The NIC is the smallest concentration of an antibiotic that slows bacterial growth. At concentrations below the NIC, growth occurs at a pace equal to the control. The definitions seems somewhat ad hoc, because it depends on how carefully you measure bacterial growth. Lambert and Pearson (1) published one method for determining the NIC and MIC with nonlinear regression. They fit the data to a Gompertz model, to fit the bottom plateau (A), the span of the curve (C), the log of the inflection point (M) and a slope factor (B). They then define the MIC and NIC from the slope and inflection point in their equations 2 and 3. I adapted their equations to fit the MIC and NIC directly (and not to fit the inflection point. I changed the names of the parameters to be more descriptive, This is one method of fitting the MIC. Frankly, I don't know if there are other methods. Please let me know if there are. You can download this Prism file, which fits sample data. It also draws a grid lines at the MIC and NIC (hooking the grid line position to an analysis constant fit by nonlinear regression). The graph also shows how he Lambert and Pearson method define the NIC and MIC: The intersection of the horizontal line at the top (or bottom) plateau of the curve (horizontal dotted) and the projection of the curve at the inflection point (red angled).
Simply replace the data with your own, and the fit will be automatic. You'll probably need to fuss with axis limits and graph appearance.
Reference. Lambert et al. Susceptibility testing: accurate and reproducible minimum inhibitory concentration (MIC) and non-inhibitory concentration (NIC) values. J Appl Microbiol (2000) vol. 88 (5) pp. 784-90. Pubmed
September 14, 2008Beta test releases of Prism 5.0b (Mac) and 5.02 (Windows).
Our programmers are working hard to complete updates of Prism 5 mac (5.0b) and Prism 5 Windows (5.02). Both are fairly polished, but not quite ready for release.
Why can't GraphPad implement OLE on the Mac?
Please write to support at graphpad dot com if you'd like to beta test. Please include your Prism 5 serial number.
OLE is a WIndows feature. It simply is not present on the Mac. Apple has not created anything like OLE for the Mac. We wish very much that they would. Microsoft has created an OLE for use on the Mac with their own programs. But they have withheld all the information software developers would need to hook into it. On Windows, Microsoft does a great job at providing all the information programmers need to use OLE (and everything else). On the Mac, they have decided to withhold all that. That makes it impossible for us to implement OLE on the mac. Do note one bug in PowerPoint 2008 Mac that you can easily work around. Prism copies a graph to the clipboard as a pdf file, which is the new mac standard. OSX converts that to a bitmap picture for programs that can't paste pdf. If you just use Paste in Powerpoint 2008, it pastes the picture, which takes up more memory and has less resolution. You need to use Paste Special to paste the crisper and smaller pdf rendition. We've reported this problem to Microsoft several times, but it has persisted in two updates to PP2008.
September 8, 2008Smoothed data can be misleading.
Smoothing data is often used when plotting data to make it easier to see trends. But there is a problem. It is too easy to see trends that don't really exist.
William M. Briggs has simulated data to make this point. He simulated two time series with 64 'years' of data. The simulations created entirely random data sets, with no correlation between the data sets, except those created by chance. He then smoothed both data sets by a variety of methods, ran a correlation analysis between the two smoothed data sets, and noted the P value. In half of 500 such simulations, using a smoothing method that used a 10 year running mean, the P value was less than 0.05 and thus would lead to a conclusion that the correlation was 'statistically significant'. Why does smoothing create such bogus and misleading results?Say one value, by chance, happens to be very high. The smoothing process will bring down that high value a bit, but bring up neighboring values. So now, instead of one aberrant high value, there is a series of several moderately high values. Correlation, like essentially all statistical analyses are based on the assumption that each value contributes independent information. With smoothed data, that isn’t true. When random chance actually only affected one point, it ends up altering a bunch of neighboring points in the smoothed data set. Using that as an input to another analysis is misleading. In the other half of his simulations, the P value was greater than 0.05. But he didn't try very hard. Let's also compute the correlation of values in one data set with the value in the other data set one year later. And two years later. And three. That gives much more chance to get small P values. This is another example of multiple comparisons. If you slice and dice data enough ways, you'll find something that has a P value less than 0.05 and this means nothing. His example shows why it is important that smoothed data never be used as the input to another analysis.
September 5, 2008What can you conclude when two error bars overlap (or not)? It is tempting to look at whether two error bars overlap or not, and try to reach a conclusion about whether the difference between means is statistically significant. Resist that temptation (Lanzante, 2005)! SD error bars SD error bars quantify the scatter among the values. Looking at whether the error bars overlap lets you compare the difference between the mean with the amount of scatter within the groups. But the t test also takes into account sample size. If the samples were larger with the same means and same standard deviations, the P value would be much smaller. If the samples were smaller with the same means and same standard deviations, the P value would be larger. When the difference between two means is statistically significant (P < 0.05), the two SD error bars may or may not overlap. Likewise, when the difference between two means is not statistically significant (P > 0.05), the two SD error bars may or may not overlap. Bottom line: Knowing whether SD error bars overlap or not does not let you conclude whether difference between the means is statistically significant or not. SEM error bars SEM error bars quantify how precisely you know the mean, taking into account both the SD and sample size. Looking at whether the error bars overlap, therefore, lets you compare the difference between the mean with the precision of those means. This sounds promising. But in fact, you don’t learn much by looking at whether SEM error bars overlap. By taking into account sample size and considering how far apart two error bars are, Cumming (2007) came up with some rules for deciding when a difference is significant or not. But these rules are hard to remember and apply. Bottom line: If two SEM error bars do overlap, then you know that the P value is (much) greater than 0.05, so the difference is not statistically significant. The opposite rule does not apply. If two SEM error bars do not overlap, the P value could be less than 0.05, or it could be greater than 0.05. CI error bars Error bars that show the 95% confidence interval (CI) are wider than SE error bars. It doesn’t help to observe that two 95% CI error bars overlap, as the difference between the two means may or may not be statistically significant. If two 95% CI error bars do not overlap, the difference is statistically significant with P < 0.05 (Payton 2003).Summary
References Cumming et al. Error bars in experimental biology. J Cell Biol (2007) vol. 177 (1) pp. 7-11 Lanzante. A Cautionary Note on the Use of Error Bars. Journal of Climate (2005) vol. 18 pp. 3699-3703 Payton et al. Overlapping confidence intervals or standard error intervals: what do they mean in terms of statistical significance?. J Insect Sci (2003) vol. 3 pp. 34
Why do many data points lie outside the regression confidence bands? When you fit linear or nonlinear regression with Prism, you can choose to also plot confidence or prediction bands. This choice is on the Linear regression dialog, and on the Diagnostics tab of the nonlinear regression dialog. Confidence bands show you how well you know the location of the best fit line or curve. Given all the assumptions of the analysis, you can be 95% sure that the true curve (nonlinear regression) or line (linear regression) lies within the bands. Prediction bands show you where you can expect the data to lie. You expect 95% of all data points to lie within the prediction bands. With many data points, you expect a large fraction of the data points to lie outside the confidence band, but 95% to lie within the prediction bands. The confidence bands aren't supposed to show you the scatter of the data, but rather the uncertainty in the position of the line or curve. With lots of data, the line or curve is known fairly precisely, so only a small fraction of data lie within the bands. The figure below demonstrates this with linear regression.
September 4, 2008When entering survival data, what should I do with animals or patients that die the first day? How should survival analysis deal with subject who die immediately after entering the study? Machim (page 8; reference below) states: "A logically valid survival time must be larger than zero." If the survival time is measured in days, and the initial and outcome event happen the same day, he recommends entering X=0.5 or some other small value (depending on the situation). If you enter X=0, Prism 4 and 5 (like most programs) ignore the data for those subjects, The survival analysis and survival curve will be exactly the same as it would have been if you had simply not entered the row(s) with X=0. , The only advantage of entering the X=0 row(s) is to better document the data. Note that Prism 3 got confused with X=0 in survival data, and the resulting survival curves and comparisons were not correct. Survival Analysis: A Practical Approach by David Machin IBSN:0470870400.
September 2, 2008How to view the XY coordinates of confidence or prediction bands. Prism can plot a best-fit nonlinear regression curve along with confidence or prediction bands (or both). Choose in the Preferences tab of Prism 5. Prism can also show you the XY coordinates that define the curve and confidence bands (or prediction bands). But it only shows this table if you ask for it:
|
||||||||||||||