Page 445 - Applied Statistics with R
P. 445

17.4. CLASSIFICATION                                              445


                      17.4.2    Evaluating Classifiers

                      The metric we’ll be most interested in for evaluating the overall performance
                      of a classifier is the misclassification rate. (Sometimes, instead accuracy is
                      reported, which is instead the proportion of correction classifications, so both
                      metrics serve the same purpose.)

                                                              
                                                                      ̂
                                                 ̂
                                       Misclass(  , Data) =  1  ∑   (   ≠   (x ))
                                                                        i
                                                              =1
                                                                    ̂
                                                                 
                                                                     i
                                                  ̂
                                            (   ≠   (x )) = { 0    =   (x )
                                                    i
                                               
                                                                    ̂
                                                           1    ≠   (x )
                                                                     i
                                                                 
                      When using this metric on the training data, it will have the same issues as RSS
                      did for ordinary linear regression, that is, it will only go down.
                      # training misclassification rate
                      mean(ifelse(predict(fit_caps) > 0, "spam", "nonspam") != spam_trn$type)
                      ## [1] 0.339


                      mean(ifelse(predict(fit_selected) > 0, "spam", "nonspam") != spam_trn$type)


                      ## [1] 0.224

                      mean(ifelse(predict(fit_additive) > 0, "spam", "nonspam") != spam_trn$type)


                      ## [1] 0.066


                      mean(ifelse(predict(fit_over) > 0, "spam", "nonspam") != spam_trn$type)


                      ## [1] 0.136

                      Because of this, training data isn’t useful for evaluating, as it would suggest that
                      we should always use the largest possible model, when in reality, that model is
                      likely overfitting. Recall, a model that is too complex will overfit. A model that
                      is too simple will underfit. (We’re looking for something in the middle.)
                      To overcome this, we’ll use cross-validation as we did with ordinary linear re-
                      gression, but this time we’ll cross-validate the misclassification rate. To do so,
                      we’ll use the cv.glm() function from the boot library. It takes arguments for
   440   441   442   443   444   445   446   447   448   449   450