Page 290 - Applied Statistics with R
P. 290
290 CHAPTER 13. MODEL DIAGNOSTICS
hatvalues(model_1) > 2 * mean(hatvalues(model_1))
## 1 2 3 4 5 6 7 8 9 10 11
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
hatvalues(model_2) > 2 * mean(hatvalues(model_2))
## 1 2 3 4 5 6 7 8 9 10 11
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
hatvalues(model_3) > 2 * mean(hatvalues(model_3))
## 1 2 3 4 5 6 7 8 9 10 11
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
We see that in the second and third plots, the added point is a point of high
leverage. Recall that only in the third plot did that have an influence on the
regression. To understand why, we’ll need to discuss outliers.
13.3.2 Outliers
Outliers are points which do not fit the model well. They may or may not have
a large affect on the model. To identify outliers, we will look for observations
with large residuals.
Note,
= − ̂ = − = ( − )
Then, under the assumptions of linear regression,
Var( ) = (1 − ℎ ) 2
2
2
and thus estimating with gives
SE[ ] = √(1 − ℎ ).
We can then look at the standardized residual for each observation, =
1, 2, … ,
2
= ∼ ( = 0, = 1)
√1 − ℎ

