Page 199 - Applied Statistics with R
P. 199
11.1. DUMMY VARIABLES 199
Automatic
Manual
30
25
mpg
20
15
10
50 100 150 200 250 300
hp
We should notice a pattern here. The red, manual observations largely fall above
the line, while the black, automatic observations are mostly below the line. This
means our model underestimates the fuel efficiency of manual transmissions, and
overestimates the fuel efficiency of automatic transmissions. To correct for this,
we will add a predictor to our model, namely, am as .
2
Our new model is
= + + + ,
2 2
1 1
0
where and remain the same, but now
1
1 manual transmission
= { .
2
0 automatic transmission
In this case, we call a dummy variable. A dummy variable is somewhat
2
unfortunately named, as it is in no way “dumb”. In fact, it is actually somewhat
clever. A dummy variable is a numerical variable that is used in a regression
analysis to “code” for a binary categorical variable. Let’s see how this works.
First, note that am is already a dummy variable, since it uses the values 0 and
1 to represent automatic and manual transmissions. Often, a variable like am
would store the character values auto and man and we would either have to
convert these to 0 and 1, or, as we will see later, R will take care of creating
dummy variables for us.

