Page 442 - Applied Statistics with R
P. 442

442                           CHAPTER 17. LOGISTIC REGRESSION


                                 Instead we’ll have to use estimated probabilities. So to create a classifier that
                                 seeks to minimize misclassifications, we would use,


                                                                    ̂
                                                      ̂
                                                       (x) = argmax   [   =    ∣ X = x].
                                                                
                                 In the case of a binary response since ̂(x) = 1 − ̂(x), this becomes
                                                                              
                                                                    
                                                           ̂
                                                           (x) = { 1  ̂ p(x) > 0.5
                                                                  0   ̂ p(x) ≤ 0.5
                                 Using this simple classification rule, we can turn logistic regression into a clas-
                                 sifier. To use logistic regression for classification, we first use logistic regression
                                 to obtain estimated probabilities, ̂(x), then use these in conjunction with the
                                                                 
                                 above classification rule.
                                 Logistic regression is just one of many ways that these probabilities could be
                                 estimated. In a course completely focused on machine learning, you’ll learn
                                 many additional ways to do this, as well as methods to directly make classifica-
                                 tions without needing to first estimate probabilities. But since we had already
                                 introduced logistic regression, it makes sense to discuss it in the context of
                                 classification.


                                 17.4.1   spam Example


                                 To illustrate the use of logistic regression as a classifier, we will use the spam
                                 dataset from the kernlab package.

                                 # install.packages("kernlab")
                                 library(kernlab)
                                 data("spam")
                                 tibble::as.tibble(spam)


                                 ## Warning: `as.tibble()` was deprecated in tibble 2.0.0.
                                 ## Please use `as_tibble()` instead.
                                 ## The signature and semantics have changed, see `?as_tibble`.

                                 ## # A tibble: 4,601 x 58
                                 ##     make address    all num3d    our  over remove internet order   mail receive
                                 ##    <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>     <dbl> <dbl> <dbl>    <dbl>
                                 ##  1  0        0.64  0.64      0  0.32  0      0         0     0     0        0
                                 ##  2  0.21     0.28  0.5       0  0.14  0.28   0.21      0.07  0     0.94     0.21
                                 ##  3  0.06     0     0.71      0  1.23  0.19   0.19      0.12  0.64  0.25     0.38
                                 ##  4  0        0     0         0  0.63  0      0.31      0.63  0.31  0.63     0.31
   437   438   439   440   441   442   443   444   445   446   447