A regression model predicts one variable Y from one or more other variables X. The Y variable is called the dependent variable, the response variable or the outcome variable. The X variables are called independent variables, explanatory variables or predictor variables.
Each X variable can be a value that the experimenter manipulated, a treatment that the experimenter selected or assigned, or a value that the experimenter measures.
Each independent variable can be: continuous (e.g., age, blood pressure, weight) or categorical (e.g. sex with levels male and female, or cell line with levels HeLa, HEK 298, CHO, and Jurkat). When categorical variables are used, they must be "encoded" using one of a variety of methods (more on this below). Note that Prism will automatically encode categorical variables when included in a regression model, so there's no need to perform this encoding yourself.
The multiple regression model defines the dependent variable as a function of the independent variables and a set of parameters, also called regression coefficients. Regression methods find the values of each parameter that make the model predictions come as close as possible to the data. This approach is analogous to linear regression, which determines the values of the slope and intercept (the two parameters or regression coefficients of the model) to make the model predict Y from X as closely as possible.
Simple regression refers to models with a single X variable. Multiple regression, also called multivariable regression, refers to models with two or more X variables.
Although they are beyond the scope of this guide, methods do exist that can simultaneously analyze several outcomes (Y variables) at once. These are called multivariate methods, and they include factor analysis, cluster analysis, principal components analysis, and multiple ANOVA (MANOVA). These methods contrast with univariate methods, which deal with only a single Y variable.
Note that the terms multivariate and univariate are used inconsistently. Sometimes multivariate is used to refer to multivariable methods for which there is one outcome and several independent variables (i.e., multiple and logistic regression). And sometimes univariate is used to refer to simple regression with only one independent variable.
Prism only performs linear multiple regression. This means that each parameter is linear with Y. If you made a graph of how Y changes as you change any parameter (while holding all the X values and all the other parameters constant), the graph would be a straight line.
It is certainly possible to write models with one Y variable and multiple X values related to Y via a nonlinear function. But Prism does not (yet) perform multiple nonlinear regression. Let us know, with details, if this would be helpful to you.
Categorical variables are those that take on one of a limited number of possible values (known as "levels"). As an example, "Car Manufacturer" could be a categorical variable with levels "Ford", "Toyota", "Dodge", "Hyundai", etc. However, regression models rely on variables having numbers to perform the necessary calculations. So in order for a categorical variable to be included into a regression model, it must be converted to a variable (more accurately a set of variables) consisting of only numbers.