Page 290 - Applied Statistics with R
P. 290

290                             CHAPTER 13. MODEL DIAGNOSTICS


                                 hatvalues(model_1) > 2 * mean(hatvalues(model_1))


                                 ##     1      2     3     4      5     6     7     8      9    10    11
                                 ## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

                                 hatvalues(model_2) > 2 * mean(hatvalues(model_2))


                                 ##     1      2     3     4      5     6     7     8      9    10    11
                                 ## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE     TRUE

                                 hatvalues(model_3) > 2 * mean(hatvalues(model_3))


                                 ##     1      2     3     4      5     6     7     8      9    10    11
                                 ## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE     TRUE

                                 We see that in the second and third plots, the added point is a point of high
                                 leverage. Recall that only in the third plot did that have an influence on the
                                 regression. To understand why, we’ll need to discuss outliers.


                                 13.3.2   Outliers

                                 Outliers are points which do not fit the model well. They may or may not have
                                 a large affect on the model. To identify outliers, we will look for observations
                                 with large residuals.
                                 Note,


                                                              
                                                        =    − ̂ =      −      = (   −   )  
                                 Then, under the assumptions of linear regression,


                                                           Var(   ) = (1 − ℎ )   2
                                                                           
                                                                 
                                                           2
                                                    2
                                 and thus estimating    with    gives
                                                             
                                                          SE[   ] =    √(1 − ℎ ).
                                                                
                                                                     
                                                                             
                                 We can then look at the standardized residual for each observation,    =
                                 1, 2, …   ,
                                                                             
                                                                               2
                                                     =            ∼     (   = 0,    = 1)
                                                     
                                                          √1 − ℎ   
                                                          
   285   286   287   288   289   290   291   292   293   294   295