Page 216 - Applied Statistics with R
P. 216
216 CHAPTER 11. CATEGORICAL PREDICTORS AND INTERACTIONS
1 8 cylinder
= {
3
0 not 8 cylinder
Now, let’s fit an additive model in R, using mpg as the response, and disp and
cyl as predictors. This should be a model that uses “three regression lines” to
model mpg, one for each of the possible cyl levels. They will all have the same
slope (since it is an additive model), but each will have its own intercept.
(mpg_disp_add_cyl = lm(mpg ~ disp + cyl, data = autompg))
##
## Call:
## lm(formula = mpg ~ disp + cyl, data = autompg)
##
## Coefficients:
## (Intercept) disp cyl6 cyl8
## 34.99929 -0.05217 -3.63325 -2.03603
The question is, what is the model that R has fit here? It has chosen to use the
model
= + + + + ,
0
1
3 3
2 2
where
• is mpg, the fuel efficiency in miles per gallon,
• is disp, the displacement in cubic inches,
• and are the dummy variables define above.
2
3
Why doesn’t R use ? Essentially because it doesn’t need to. To create three
1
lines, it only needs two dummy variables since it is using a reference level, which
in this case is a 4 cylinder car. The three “sub models” are then:
• 4 Cylinder: = + +
0
1
• 6 Cylinder: = ( + ) + +
0
2
1
• 8 Cylinder: = ( + ) + +
1
3
0
Notice that they all have the same slope. However, using the two dummy
variables, we achieve the three intercepts.
• is the average mpg for a 4 cylinder car with 0 disp.
0
• + is the average mpg for a 6 cylinder car with 0 disp.
0
2

