Page 422 - Applied Statistics with R
P. 422

422                           CHAPTER 17. LOGISTIC REGRESSION



                                                                        
                                                       (  ) = ∏   (x ) ∏ (1 −   (x ))
                                                                   i
                                                                                j
                                                             ∶      =1    ∶      =0

                                                        0 +   1      1 +⋯+     −1      (  −1)  1
                                       (  ) = ∏                         ∏
                                                1 +       0 +   1      1 +⋯+     −1      (  −1)  1 +       0 +   1      1 +⋯+     −1      (  −1)
                                             ∶      =1                   ∶      =0

                                 Unfortunately, unlike ordinary linear regression, there is no analytical solution
                                 for this maximization problem. Instead, it will need to be solved numerically.
                                 Fortunately, R will take care of this for us using an iteratively reweighted least
                                 squares algorithm. (We’ll leave the details for a machine learning or optimization
                                 course, which would likely also discuss alternative optimization strategies.)


                                 17.2.2   Fitting Issues

                                                                       ∗
                                 We should note that, if there exists some    such that
                                                            ⊤ ∗
                                                          x    > 0 ⟹    = 1
                                                                           
                                                           i
                                 and

                                                            ⊤ ∗
                                                          x    < 0 ⟹    = 0
                                                                           
                                                           i
                                 for all observations, then the MLE is not unique. Such data is said to be
                                 separable.
                                 This, and similar numeric issues related to estimated probabilities near 0 or 1,
                                 will return a warning in R:

                                 ## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred


                                 When this happens, the model is still “fit,” but there are consequences, namely,
                                 the estimated coefficients are highly suspect. This is an issue when then trying
                                 to interpret the model. When this happens, the model will often still be useful
                                 for creating a classifier, which will be discussed later. However, it is still subject
                                 to the usual evaluations for classifiers to determine how well it is performing.
                                 For details, see Modern Applied Statistics with S-PLUS, Chapter 7.


                                 17.2.3   Simulation Examples
   417   418   419   420   421   422   423   424   425   426   427