Page 372 - Applied Statistics with R
P. 372

372                                   CHAPTER 15. COLLINEARITY


                                 vif(hip_model)



                                 ##         Age     Weight    HtShoes          Ht     Seated         Arm       Thigh
                                 ##   1.997931    3.647030 307.429378 333.137832    8.951054    4.496368   2.762886
                                 ##         Leg
                                 ##   6.694291


                                 In practice it is common to say that any VIF greater than 5 is cause for concern.
                                 So in this example we see there is a huge multicollinearity issue as many of the
                                 predictors have a VIF greater than 5.
                                 Let’s further investigate how the presence of collinearity actually affects a model.
                                 If we add a moderate amount of noise to the data, we see that the estimates of
                                 the coefficients change drastically. This is a rather undesirable effect. Adding
                                 random noise should not affect the coefficients of a model.

                                 set.seed(1337)
                                 noise = rnorm(n = nrow(seatpos), mean = 0, sd = 5)
                                 hip_model_noise = lm(hipcenter + noise ~ ., data = seatpos)


                                 Adding the noise had such a large effect, the sign of the coefficient for Ht has
                                 changed.

                                 coef(hip_model)



                                 ##  (Intercept)           Age        Weight      HtShoes            Ht        Seated
                                 ## 436.43212823    0.77571620    0.02631308  -2.69240774    0.60134458   0.53375170
                                 ##           Arm        Thigh           Leg
                                 ##  -1.32806864   -1.14311888   -6.43904627


                                 coef(hip_model_noise)


                                 ##  (Intercept)           Age        Weight      HtShoes            Ht        Seated
                                 ## 415.32909380    0.76578240    0.01910958  -2.90377584   -0.12068122   2.03241638
                                 ##           Arm        Thigh           Leg
                                 ##  -1.02127944   -0.89034509   -5.61777220


                                 This tells us that a model with collinearity is bad at explaining the relationship
                                 between the response and the predictors. We cannot even be confident in the
                                 direction of the relationship. However, does collinearity affect prediction?
   367   368   369   370   371   372   373   374   375   376   377