Viewing By Month : June 2008 / Main
June 8, 2008

Centered polynomial regression

The standard polynomial models look like this:

Y= B0 + B1*X +B2*X^2


More terms are included with the higher order equations. There are two problems with polynomial fits:

  • When the X values are large, and start well above zero (for example, when  X is calendar year from 1980 to 2005), taking the very large X values to large powers can lead to math overflow. Even if the program doesn't report any math error, the results can be inaccurate. Some coefficients will be positive and some negative, so the value of Y depends on subtracting huge numbers from other huge numbers, leading to imprecise results.
  • Even when the X values are not large, the parameters of the model are intertwined, so have high covariance and dependency. This results in large standard errors , and wide confidence intervals, and confidence or prediction bands. In many cases, this problem is severe enough that Prism reports that the results are 'ambiguous' and so doesn't report confidence intervals for all the parameters and can't graph confidence bands.


Both problems go away, when the X values are centered. The idea of centering is to subtract the mean X from all X values before fitting the model. This can be done as part of nonlinear regression,  using this model:

             XC = X - Xmean

Y= B0 + B1*XC +B2*XC^2


Here XC is the centered X value, equal to the X value minus Xmean, which  is the mean of all X values. Xmean needs to be a constant, and not a parameter that Prism tries to fit.  Of course, you can include more terms in the definition of Y. 

Fitting the centered model leads to exactly the same curve (unless the regular  approach led to math errors). Accordingly, the sum-of-squares is the same, which means that the results of model comparisons are identical.

However, the centered model has reparameterized the equation. The parameters have different meanings, so have different best-fit values (except the last parameter which is the same), different standard errors and confidence intervals , smaller covariances and dependencies,  and narrower confidence/prediction bands.

With Prism 5.02 and 5.0b, you will able to constrain XMean to be a column constant equal to the mean X values. Until these versions are released, you'll need to compute the mean X value manually, and then constrain XMean to a constant value (and enter the mean X you already computed).

Here is a Prism file that demonstrates centered polynomial fitting. Open it, go to one of the fits, change parameters and then then "edit" the equation without changing anything. This will place the equation into your user-defined equation list. 

June 6, 2008

 Are outlier tests useful when data come from a distribution that is not Gaussian?

No.

Most outlier tests are based on the assumption that the data, except the potential outier(s), come from a Gaussian distribution. If the distribution is not Gaussian, outlier tests are misleading. Here is an example.  Grubbs outlier test found an outlier in three of these four data sets.

But these data are not sampled from a Gaussian distribution with an outlier. Rather they are sampled from a lognormal distribution. Transform all the values to their logarithms, and the distribution becomes Gaussian:

The apparent outliers are gone. Grubb's test finds no ouliters.  The extreme points only appeared to be outliers because extreme values are common in a lognormal distribution but rare in a Gaussian distribution. If you don’t realize the distribution was lognormal, an outlier test would be very misleading.

 

Removing all values that are too big or too small.

 When analyzing data, sometimes you want to graph or analyze only a portion of the values, and remove any values that are higher (or lower) than some threshold. You can do this with a user-defined Prism transform. Here is a transform that removes any data with Y greater than 100:

   Y=IF(Y>100, 0/0, Y)

That transforms any values greater than 100 to 0/0 which is undefined, so becomes blank in the results table. The other values get transformed to equal Y (no change). 

Here is a transform that removes any data with Y greater than 100 or less than 10. 

  Y=IF(Y>100, 0/0, IF(Y<10, 0/0,Y))

This simply nests two IF functions in the transform. 

To enter a user defined tranform, go to a data table, click analyze, and choose Transform. At the top of the dialog, choose User-defined Y transforms. On the new dialog, click Add to create a new transform. 

Of course, you could create an X transform and use similar syntax to remove rows where X is too high or too low (or meets some other criterion). 

June 4, 2008
How to make the right and left Y axes look different

When you create a graph with two Y axes, Prism always creates them with the same length and the same color.

To make the lengths appear different:

  1. Use the rectangle tool to draw a rectangle over the part of the axis you don't want to see
  2. Make it white (or whatever your page background is) with a solid fill. Then that axis appears to be shorter.

To give the axes different colors:

  1. Click on one of the axes to select.
  2. Drop the Change menu, and choose Selected Object
  3. Change the color.

To give the axis numbering a different color or font:

  1. Click on one of the axes to select it.
  2. Use toolbar buttons to change the color or font used to number that axis.

To only have one axis but put in on the right side of the graph:

The first axis created is called the 'left Y axis', but in fact it does not need to be placed on the left side of the graph. It can be anywhere. To put this  axis on the right side of a graph:

  1. Double click on an axis to bring up Format Axis dialog
  2. Go to the first tab, Frame and Origin.
  3. Set the Origin to the lower right. 

To delete the right Y axis:

  1. Double-click on it to bring up the Format Axis dialog, 
  2. Make sure you are on the right Y axis tab.
  3. Drop down the list labeled 'Gaps and direction'.
  4. Choose: No right Y axis. 

 

June 1, 2008

Plotting t, z, F, or chi-square distributions with Prism.

GraphPad Prism can generate probability distributions. This demonstrates Prism's ability to plot functions from user-defined functions, and also the use of hooking info constants to analyses.

Download this Prism 5 file to generate and plot the graphs shown below. 
 

z, t, F, chisquare distributions

 

In each case, the simulation generates two (or three) data sets. The first (A) data set plots the entire curve. The second (and third) data sets only plot values where X is greater than (less than) a specified cutoff value. This second (and third) data set are plotted with area fill to shade the tails of the distributions. Remove data set B or C from the graph if you only want to shade one tail. 

Change the numbers of degrees of freedom and the cutoff values (for shading) in the Info sheet. This demonstrates how values entered into an info sheet can be 'hooked' to constants used in analyses.

 Graphing a Binomial or Poisson distribution with Prism. 

 Prism can graph a Binomial or Poisson distribution. Download the file that generated this pair of graphs. 

 

To modify this file, change the value of lamda (for Poission) or the probability, n, and cutoff (Binomial) in the Info sheet. Enter new values there, and the graph updates. This is a good example of the usefulness of hooking an info constant to an analysis.

If you want to recreate graphs like these, keep in mind these points:

  •  As its name suggests, the analysis 'Create a Family  of Theoretical Curves' is usually used to create curves, not bar graphs. When you choose the range of X values, specify the appropriate number of 'line segments' (points) so that the X interval equals 1.0. The binomial example on the left created 16 'line segments' starting at X=0 and ending at X=15. The Poisson on the right created 13 'segments' with X starting at 0 and ending at 13.
  • The analysis will create a set of line segments (an attempt to create a curve). Click the change type of graph button, or drop the Change menu and choose Graph Type. Then choose the Grouped tab, and then choose interleaved bars. 
  • The binomial example on the right has two data sets. You don't want them plotted interleaved, as selected in the previous step. Double click to bring up Format Graph, then go to the middle tab, and choose to superimpose the second data set on the first (rather than interleave). And assign it a different color.