|
Robust nonlinear regression |
|
|
The need for robust regression Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. This standard method for performing nonlinear (or linear regression) is called least-squares. Experimental mistakes can lead to erroneous values whose values are way too high or too low– outliers. Even a single outlier can dominate the sum-of-the-squares calculation, and lead to misleading results. One way to cope with this problem is to perform a robust fit using a method that is not very sensitive to violations of the Gaussian assumption. Another approach is to use automatic outlier elimination to identify and remove the outliers, and then run least-squares regression. Prism offers both choices. How robust regression works Based on a suggestion in Numerical Recipes (1), we based our robust fitting method on the assumption that variation around the curve follows a Lorentzian distribution, rather than a Gaussian distribution. Both distributions are part of a family of t distributions:
The widest distribution in that figure, the t distribution for df=1, is also known as the Lorentzian distribution or Cauchy distribution. The Lorentzian distribution has wide tails, so outliers are fairly common and therefore have little impact on the fit. We adapted the Marquardt nonlinear regression algorithm to accommodate the assumption of a Lorentzian (rather than Gaussian) distribution of residuals, and explain the details in reference 2. When does it make sense to choose robust nonlinear regression? If your goal is just to obtain best-fit values of the parameters, robust regression works great. Outliers have little impact. Yet if all the data is Gaussian, robust regression and least-squares regression give almost identical results Robust regression (as implemented by Prism) has three drawbacks:
The main use of robust regression in Prism is as a 'baseline' from which to remove outliers. Its inability to compute standard errors or confidence intervals of the parameters greatly limits the usefulness of robust regression. We recommend it only to those who want to better understand the outlier-removal method (which begins with robust regression). References
|