Robust nonlinear regression

Print this Topic

The need for robust regression

Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. This standard method for performing nonlinear (or linear regression) is called least-squares.

Experimental mistakes can lead to erroneous values whose values are way too high or too low outliers. Even a single outlier can dominate the sum-of-the-squares calculation, and lead to misleading results. One way to cope with this problem is to perform a robust fit using a method that is not very sensitive to violations of the Gaussian assumption. Another approach is to use automatic outlier elimination to identify and remove the outliers, and then run least-squares regression. Prism offers both choices.

How robust regression works

Based on a suggestion in Numerical Recipes (1), we based our robust fitting method on the assumption that variation around the curve follows a Lorentzian distribution, rather than a Gaussian distribution. Both distributions are part of a family of t distributions:

The widest distribution in that figure, the t distribution for df=1, is also known as the Lorentzian distribution or Cauchy distribution. The Lorentzian distribution has wide tails, so outliers are fairly common and therefore have little impact on the fit.

We adapted the Marquardt nonlinear regression algorithm to accommodate the assumption of a Lorentzian (rather than Gaussian) distribution of residuals, and explain the details in reference 2.

When does it make sense to choose robust nonlinear regression?

If your goal is just to obtain best-fit values of the parameters, robust regression works great. Outliers have little impact. Yet if all the data is Gaussian, robust regression and least-squares regression give almost identical results

Robust regression (as implemented by Prism) has three drawbacks:

Robust regression cannot generate standard errors or confidence intervals for the parameters.
Robust regression cannot generate confidence or prediction bands.
Robust regression cannot compare the fits of two models or two datasets.

The main use of robust regression in Prism is as a 'baseline' from which to remove outliers. Its inability to compute standard errors or confidence intervals of the parameters greatly limits the usefulness of robust regression. We recommend it only to those who want to better understand the outlier-removal method (which begins with robust regression).

References                                                                         

1.Press WH, Teukolsky SA, Vettering WT, Flannery BP: Numerical Recipes in C. the Art of Scientific Computing. New York, NY: Cambridge University Press; 1988.
2.Motulsky HM and Brown RE, Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate, BMC Bioinformatics 2006, 7:123. Download as pdf.

 

 



Copyright (c) 2007 GraphPad Software Inc. All rights reserved.
URL: http://www.graphpad.com/help/Prism5/Prism5Help.html?reg_fit_tab_2.htm