Page 290 - Applied Statistics with R

P. 290

290 CHAPTER 13. MODEL DIAGNOSTICS

hatvalues(model_1) > 2 * mean(hatvalues(model_1))

## 1 2 3 4 5 6 7 8 9 10 11
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

hatvalues(model_2) > 2 * mean(hatvalues(model_2))

## 1 2 3 4 5 6 7 8 9 10 11
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

hatvalues(model_3) > 2 * mean(hatvalues(model_3))

## 1 2 3 4 5 6 7 8 9 10 11
## FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

We see that in the second and third plots, the added point is a point of high
leverage. Recall that only in the third plot did that have an influence on the
regression. To understand why, we’ll need to discuss outliers.

13.3.2 Outliers

Outliers are points which do not fit the model well. They may or may not have
a large affect on the model. To identify outliers, we will look for observations
with large residuals.
Note,

= − ̂ = − = ( − )
Then, under the assumptions of linear regression,

Var( ) = (1 − ℎ ) 2

2
2
and thus estimating with gives

SE[ ] = √(1 − ℎ ).

We can then look at the standardized residual for each observation, =
1, 2, … ,

2
= ∼ ( = 0, = 1)

√1 − ℎ

285 286 287 288 289 290 291 292 293 294 295