Page 426 - Applied Statistics with R

P. 426

426 CHAPTER 17. LOGISTIC REGRESSION

̂
both, we are plotting E[ ∣ X = x], the estimated mean, which for a binary
response happens to be an estimate of [ = 1 ∣ X = x].
We immediately see why ordinary linear regression is not a good idea. While it
is estimating the mean, we see that it produces estimates that are less than 0!
(And in other situations could produce estimates greater than 1!) If the mean
is a probability, we don’t want probabilities less than 0 or greater than 1.
Enter logistic regression. Since the output of the inverse logit function is re-
stricted to be between 0 and 1, our estimates make much more sense as prob-
abilities. Let’s look at our estimated coeﬀicients. (With a lot of rounding, for
simplicity.)

round(coef(fit_glm), 1)

## (Intercept) x
## -2.3 3.7

Our estimated model is then:

̂ (x)
log ( ) = −2.3 + 3.7
1 − ̂(x)

Because we’re not directly estimating the mean, but instead a function of the
̂
mean, we need to be careful with our interpretation of = 3.7. This means
1
that, for a one unit increase in , the log odds change (in this case increase) by
̂
3.7. Also, since is positive, as we increase we also increase ̂(x). To see

1
how much, we have to consider the inverse logistic function.
For example, we have:
̂
[ = 1 ∣ = −0.5] = −2.3+3.7⋅(−0.5) ≈ 0.016
1 + −2.3+3.7⋅(−0.5)
−2.3+3.7⋅(0)
̂
[ = 1 ∣ = 0] = ≈ 0.09112296
1 + −2.3+3.7⋅(0)

̂
[ = 1 ∣ = 1] = −2.3+3.7⋅(1) ≈ 0.8021839
1 + −2.3+3.7⋅(1)
Now that we know we should use logistic regression, and not ordinary linear
regression, let’s consider another example. This time, let’s consider the model

(x)
log ( ) = 1 + −4 .
1 − (x)

421 422 423 424 425 426 427 428 429 430 431