|
||
|
Viewing By Month : July 2009 / Main
July 14, 2009GraphPad InStat 3.1 is available
If you own a license for InStat 3, please update free to version 3.1. Updates are available for both Mac and Windows versions.
The biggest change is that we've increased the size of the data table, which can now have 10,000 rows and 52 columns. Other changes are listed here. If you are not familiar with InStat, it is a very simple statistics program -- so simple, anyone can master it in just a few minutes. Learn more.
July 9, 2009The dangers of smoothing.
Matt Briggs has previously written about the dangers of smoothing here and here. The problem is simple: smoothing induces spurious correlations. His latest post points out that smoothing can make it appear that a prediction or forecast is far more accurate than it really is.
I had a hard time following his argument, so I wrote a page which includes my own set of simulations and smoothing and my own explanations for why smoothing leads to problems. Here is another way to see the dangers of smoothing. Steve McIntyre at ClimateAudit.org showed how the S&P500 looks with and without smoothing. If you only saw the smoothed data, you'd get a very wrong impression about the state of the stock market.
July 7, 2009The sum of two Gaussian distributions is not always bimodal.Is the distribution of height bimodal? It depends on who you include. If you include both men and women, most people expect to see a bimodal distribution. In fact, it is not....Read more...
July 1, 2009The distinction between confidence, prediction and tolerance intervalsWhen you fit a parameter to a model, the accuracy or precision can be expressed as a confidence interval, a prediction interval or a tolerance interval. The three are quite distinct. The discussion below explains the three different intervals for the simple case of fitting a mean to a sample of data (assuming sampling from a Gaussian distribution). The same ideas can be applied to intervals for any best-fit parameter determined by regression. Confidence intervals tell you about how well you have determined the mean. Assume that the data really are randomly sampled from a Gaussian distribution. If you do this many times, and calculate a confidence interval of the mean from each sample, you'd expect about 95 % of those intervals to include the true value of the population mean. The key point is that the confidence interval tells you about the likely location of the true population parameter. Prediction intervals tell you where you can expect to see the next data point sampled. Assume that the data really are randomly sampled from a Gaussian distribution. Collect a sample of data and calculate a prediction interval. Then sample one more value from the population. If you do this many times, you'd expect that next value to lie within that prediction interval in 95% of the samples.The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean. Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval. Before moving on to tolerance intervals, let's define that word 'expect' used in defining a prediction interval. It means there is a 50% chance that you'd see the value within the interval in more than 95% of the samples, and a 50% chance that you'd see the value within the interval in less than 50% of the samples. What if you want to be 95% sure that the interval contains 95% of the values? Or 90% sure that the interval contains 99% of the values? Those latter questions are answered by a tolerance interval. To compute, or understand, a tolerance interval you have to specify two different percentages. One expresses how sure you want to be, and the other expresses what fraction of the values the interval will contain. If you set the first value (how sure) to 50%, then a tolerance interval is the same as a prediction interval. If you set it to a higher value (say 90% or 99%) then the tolerance interval is wider.
|
||