Page 449 - Applied Statistics with R

P. 449

17.4. CLASSIFICATION 449

Instead of simply evaluating a classifier based on its misclassification rate (or
accuracy), we’ll define two additional metrics, sensitivity and specificity. Note
that these are simply two of many more metrics that can be considered. The
Wikipedia page for sensitivity and specificity details a large number of metrics
that can be derived form a confusion matrix.
Sensitivity is essentially the true positive rate. So when sensitivity is high, the
number of false negatives is low.

TP TP
Sens = True Positive Rate = =
P TP + FN
Here we have an R function to calculate the sensitivity based on the confusion
matrix. Note that this function is good for illustrative purposes, but is easily
broken. (Think about what happens if there are no “positives” predicted.)

get_sens = function(conf_mat) {
conf_mat[2, 2] / sum(conf_mat[, 2])
}

Specificity is essentially the true negative rate. So when specificity is high, the
number of false positives is low.

TN TN
Spec = True Negative Rate = =
N TN + FP
get_spec = function(conf_mat) {
conf_mat[1, 1] / sum(conf_mat[, 1])
}

We calculate both based on the confusion matrix we had created for our classi-
fier.

get_sens(conf_mat_50)

## [1] 0.8892025

get_spec(conf_mat_50)

## [1] 0.9418498

Recall that we had created this classifier using a probability of 0.5 as a “cutoff”
for how observations should be classified. Now we’ll modify this cutoff. We’ll

444 445 446 447 448 449 450 451 452 453 454