Page 383 - Applied Statistics with R
P. 383

Chapter 16




                      Variable Selection and


                      Model Building







                           “Choose well. Your choice is brief, and yet endless.”
                           — Johann Wolfgang von Goethe

                      After reading this chapter you will be able to:


                         • Understand the trade-off between goodness-of-fit and model complexity.
                         • Use variable selection procedures to find a good model from a set of pos-
                           sible models.
                         • Understand the two uses of models: explanation and prediction.

                      Last chapter we saw how correlation between predictor variables can have un-
                      desirable effects on models. We used variance inflation factors to assess the
                      severity of the collinearity issues caused by these correlations. We also saw how
                      fitting a smaller model, leaving out some of the correlated predictors, results
                      in a model which no longer suffers from collinearity issues. But how should we
                      chose this smaller model?
                      This chapter, we will discuss several criteria and procedures for choosing a
                      “good” model from among a choice of many.


                      16.1     Quality Criterion


                                                        2
                      So far, we have seen criteria such as    and RMSE for assessing quality of fit.
                      However, both of these have a fatal flaw. By increasing the size of a model, that
                      is adding predictors, that can at worst not improve. It is impossible to add a

                                                       383
   378   379   380   381   382   383   384   385   386   387   388