GraphPad Prism 9 Curve Fitting Guide - What are Log Odds and why does logistic regression use them?

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Printable Version
Save Permalink URL

Navigation: PRINCIPLES OF REGRESSION > Principles of simple logistic regression > Understanding Log Odds and Interpreting Coefficient Estimates

What are Log Odds and why does logistic regression use them?

Scroll Prev Top Next More

The model for simple logistic regression is written logit[P(Y=1)] = β0 + β1 * X + error.

On the right-hand side, this matches the model for simple linear regression (remember the simple linear regression model is Y = intercept + slope*X). The left-hand side includes a “logit” function (long o, soft g) that adjusts for the fact that Y is a variable that can only take on values of 0 and 1. Briefly, the logit is the log of the odds that Y=1, and “P(Y=1)” is the probability that Y is equal to 1. Note that “P” in this case is an abbreviation for probability and has nothing to do with P values.

To understand what “log odds” are, it’s important to know what is meant by odds. The odds equals the probability that Y=1 divided by the probability that Y=0. For example, if the probability that Y =1 is 0.8 (or that there’s an 80% probability of Y=1), then the probability that Y=0 is 1-0.8 or 0.2 (remember, Y can only be 0 or 1, so the probability that Y=0 is 1-[probability that Y=1]). Using these numbers, we can calculate the odds as the ratio of these two numbers:

Odds = P(Y=1)/P(Y=0) = 0.8/0.2 = 4

In this case, the odds is 4. You will often hear people refer to this as 4:1 odds, which you would read as "four to one odds." Now that we know how odds are related to probability, we can take the final step to calculate the log odds. This simply involves using the calculated value for odds and taking the natural logarithm (Ln) of that value:

Log odds = Ln(Odds) = Ln(P(Y=1)/P(Y=0)) = Ln(P(Y=1)/[1-P(Y=1)])

All of the forms of log odds listed above are equivalent, and while this math can sound quite confusing, the reason that we go through all of this work is that we want to model the probability that Y=1 (or Y=0).

More to the point, we want to use a linear model (the right hand side of the simple logistic regression equation) to model this probability. Recall that a probability ranges between 0 and 1. The right-hand side of the simple logistic regression model, like the simple linear regression model, can generate (in theory) any value from negative infinity to positive infinity. The logit function is used to serve as a link between these two ranges.

Start with probability: these values can only go from 0 to 1:

First, we take the odds, which transforms this scale from 0 to 1 to a scale from 0 to +infinity (calculate the odds for any probability between 0 and 1 and see for yourself!):

Next, we take the natural logarithm of the odds to get the log odds which transforms the scale again to one that goes from negative infinity to positive infinity:

So you can think of the logit function as using math to connect the values generated by the right-hand side of the model (can be any value) to the bound values of probability (must be between 0 and 1).

Please enable JavaScript to view this site.