Page 449 - Applied Statistics with R
P. 449

17.4. CLASSIFICATION                                              449


                      Instead of simply evaluating a classifier based on its misclassification rate (or
                      accuracy), we’ll define two additional metrics, sensitivity and specificity. Note
                      that these are simply two of many more metrics that can be considered. The
                      Wikipedia page for sensitivity and specificity details a large number of metrics
                      that can be derived form a confusion matrix.
                      Sensitivity is essentially the true positive rate. So when sensitivity is high, the
                      number of false negatives is low.

                                                              TP       TP
                                     Sens = True Positive Rate =  =
                                                               P    TP + FN
                      Here we have an R function to calculate the sensitivity based on the confusion
                      matrix. Note that this function is good for illustrative purposes, but is easily
                      broken. (Think about what happens if there are no “positives” predicted.)

                      get_sens = function(conf_mat) {
                        conf_mat[2, 2] / sum(conf_mat[, 2])
                      }


                      Specificity is essentially the true negative rate. So when specificity is high, the
                      number of false positives is low.


                                                               TN       TN
                                    Spec = True Negative Rate =   =
                                                               N     TN + FP
                      get_spec =   function(conf_mat) {
                        conf_mat[1, 1] / sum(conf_mat[, 1])
                      }


                      We calculate both based on the confusion matrix we had created for our classi-
                      fier.

                      get_sens(conf_mat_50)


                      ## [1] 0.8892025


                      get_spec(conf_mat_50)


                      ## [1] 0.9418498


                      Recall that we had created this classifier using a probability of 0.5 as a “cutoff”
                      for how observations should be classified. Now we’ll modify this cutoff. We’ll
   444   445   446   447   448   449   450   451   452   453   454