Page 422 - Applied Statistics with R
P. 422
422 CHAPTER 17. LOGISTIC REGRESSION
( ) = ∏ (x ) ∏ (1 − (x ))
i
j
∶ =1 ∶ =0
0 + 1 1 +⋯+ −1 ( −1) 1
( ) = ∏ ∏
1 + 0 + 1 1 +⋯+ −1 ( −1) 1 + 0 + 1 1 +⋯+ −1 ( −1)
∶ =1 ∶ =0
Unfortunately, unlike ordinary linear regression, there is no analytical solution
for this maximization problem. Instead, it will need to be solved numerically.
Fortunately, R will take care of this for us using an iteratively reweighted least
squares algorithm. (We’ll leave the details for a machine learning or optimization
course, which would likely also discuss alternative optimization strategies.)
17.2.2 Fitting Issues
∗
We should note that, if there exists some such that
⊤ ∗
x > 0 ⟹ = 1
i
and
⊤ ∗
x < 0 ⟹ = 0
i
for all observations, then the MLE is not unique. Such data is said to be
separable.
This, and similar numeric issues related to estimated probabilities near 0 or 1,
will return a warning in R:
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
When this happens, the model is still “fit,” but there are consequences, namely,
the estimated coefficients are highly suspect. This is an issue when then trying
to interpret the model. When this happens, the model will often still be useful
for creating a classifier, which will be discussed later. However, it is still subject
to the usual evaluations for classifiers to determine how well it is performing.
For details, see Modern Applied Statistics with S-PLUS, Chapter 7.
17.2.3 Simulation Examples

