Page 227 - Applied Statistics with R
P. 227

11.5. BUILDING LARGER MODELS                                      227


                      ## hp                -3.545e-01   8.123e-02  -4.364 1.65e-05 ***
                      ## domestic          -1.257e+01   7.064e+00  -1.780    0.0759 .
                      ## disp:hp            1.369e-03   6.727e-04   2.035    0.0426 *
                      ## disp:domestic      4.933e-02   6.400e-02   0.771    0.4414
                      ## hp:domestic        1.852e-01   8.709e-02   2.126    0.0342 *
                      ## disp:hp:domestic -9.163e-04    6.768e-04  -1.354    0.1766
                      ## ---
                      ## Signif. codes:   0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
                      ##
                      ## Residual standard error: 3.88 on 375 degrees of freedom
                      ## Multiple R-squared:     0.76,  Adjusted R-squared:   0.7556
                      ## F-statistic: 169.7 on 7 and 375 DF,    p-value: < 2.2e-16

                      Do we actually need this large of a model? Let’s first test for the necessity of
                      the three-way interaction term. That is,


                                                       ∶    = 0.
                                                     0
                                                         7
                      So,

                         • Full Model:    =    +       +       +       +          +          +          +
                                                                            5 1 3
                                                1 1
                                                                                    6 2 3
                                                             3 3
                                                                   4 1 2
                                            0
                                                      2 2
                                       +   
                            7 1 2 3
                         • Null Model:    =    +       +       +       +          +          +          +   
                                                1 1
                                            0
                                                                           5 1 3
                                                            3 3
                                                                                   6 2 3
                                                                   4 1 2
                                                      2 2
                      We fit the null model in R as two_way_int_mod, then use anova() to perform
                      an   -test as usual.
                      two_way_int_mod = lm(mpg ~ disp * hp + disp * domestic + hp * domestic, data = autompg)
                      #two_way_int_mod = lm(mpg ~ (disp + hp + domestic) ^ 2, data = autompg)
                      anova(two_way_int_mod, big_model)
                      ## Analysis of Variance Table
                      ##
                      ## Model 1: mpg ~ disp * hp + disp * domestic + hp * domestic
                      ## Model 2: mpg ~ disp * hp * domestic
                      ##    Res.Df    RSS Df Sum of Sq       F Pr(>F)
                      ## 1     376 5673.2
                      ## 2     375 5645.6  1     27.599 1.8332 0.1766
                      We see the p-value is somewhat large, so we would fail to reject. We prefer the
                      smaller, less flexible, null model, without the three-way interaction.
                      A quick note here: the full model does still “fit better.” Notice that it has a
                      smaller RMSE than the null model, which means the full model makes smaller
                      (squared) errors on average.
   222   223   224   225   226   227   228   229   230   231   232