Page 216 - Applied Statistics with R
P. 216

216 CHAPTER 11. CATEGORICAL PREDICTORS AND INTERACTIONS



                                                               1  8 cylinder
                                                            = {
                                                          3
                                                               0  not 8 cylinder
                                 Now, let’s fit an additive model in R, using mpg as the response, and disp and
                                 cyl as predictors. This should be a model that uses “three regression lines” to
                                 model mpg, one for each of the possible cyl levels. They will all have the same
                                 slope (since it is an additive model), but each will have its own intercept.


                                 (mpg_disp_add_cyl = lm(mpg ~ disp + cyl, data = autompg))


                                 ##
                                 ## Call:
                                 ## lm(formula = mpg ~ disp + cyl, data = autompg)
                                 ##
                                 ## Coefficients:
                                 ## (Intercept)          disp          cyl6         cyl8
                                 ##    34.99929      -0.05217      -3.63325     -2.03603


                                 The question is, what is the model that R has fit here? It has chosen to use the
                                 model


                                                        =    +       +       +       +   ,
                                                           0
                                                               1
                                                                           3 3
                                                                     2 2
                                 where
                                    •    is mpg, the fuel efficiency in miles per gallon,
                                    •    is disp, the displacement in cubic inches,
                                    •    and    are the dummy variables define above.
                                       2
                                             3
                                 Why doesn’t R use    ? Essentially because it doesn’t need to. To create three
                                                   1
                                 lines, it only needs two dummy variables since it is using a reference level, which
                                 in this case is a 4 cylinder car. The three “sub models” are then:

                                    • 4 Cylinder:    =    +       +   
                                                      0
                                                           1
                                    • 6 Cylinder:    = (   +    ) +       +   
                                                       0
                                                           2
                                                                 1
                                    • 8 Cylinder:    = (   +    ) +       +   
                                                                 1
                                                           3
                                                       0
                                 Notice that they all have the same slope. However, using the two dummy
                                 variables, we achieve the three intercepts.
                                    •    is the average mpg for a 4 cylinder car with 0 disp.
                                       0
                                    •    +    is the average mpg for a 6 cylinder car with 0 disp.
                                       0
                                            2
   211   212   213   214   215   216   217   218   219   220   221