Page 426 - Applied Statistics with R
P. 426

426                           CHAPTER 17. LOGISTIC REGRESSION


                                                      ̂
                                 both, we are plotting E[   ∣ X = x], the estimated mean, which for a binary
                                 response happens to be an estimate of   [   = 1 ∣ X = x].
                                 We immediately see why ordinary linear regression is not a good idea. While it
                                 is estimating the mean, we see that it produces estimates that are less than 0!
                                 (And in other situations could produce estimates greater than 1!) If the mean
                                 is a probability, we don’t want probabilities less than 0 or greater than 1.
                                 Enter logistic regression. Since the output of the inverse logit function is re-
                                 stricted to be between 0 and 1, our estimates make much more sense as prob-
                                 abilities. Let’s look at our estimated coefficients. (With a lot of rounding, for
                                 simplicity.)

                                 round(coef(fit_glm), 1)


                                 ## (Intercept)            x
                                 ##         -2.3         3.7


                                 Our estimated model is then:

                                                              ̂   (x)
                                                      log (       ) = −2.3 + 3.7  
                                                           1 − ̂(x)
                                                                
                                 Because we’re not directly estimating the mean, but instead a function of the
                                                                                    ̂
                                 mean, we need to be careful with our interpretation of    = 3.7. This means
                                                                                    1
                                 that, for a one unit increase in   , the log odds change (in this case increase) by
                                                 ̂
                                 3.7. Also, since    is positive, as we increase    we also increase ̂(x). To see
                                                                                             
                                                 1
                                 how much, we have to consider the inverse logistic function.
                                 For example, we have:
                                               ̂
                                                [   = 1 ∣    = −0.5] =     −2.3+3.7⋅(−0.5)  ≈ 0.016
                                                                   1 +    −2.3+3.7⋅(−0.5)
                                                                    −2.3+3.7⋅(0)
                                               ̂
                                                [   = 1 ∣    = 0] =           ≈ 0.09112296
                                                                1 +    −2.3+3.7⋅(0)

                                                ̂
                                                [   = 1 ∣    = 1] =     −2.3+3.7⋅(1)  ≈ 0.8021839
                                                                1 +    −2.3+3.7⋅(1)
                                 Now that we know we should use logistic regression, and not ordinary linear
                                 regression, let’s consider another example. This time, let’s consider the model


                                                                (x)
                                                        log (       ) = 1 + −4  .
                                                            1 −   (x)
   421   422   423   424   425   426   427   428   429   430   431