Page 385 - Applied Statistics with R
P. 385

16.1. QUALITY CRITERION                                           385


                      terms will be constant across all models applied to the same data. So, when a
                      model fits well, that is, has a low RSS, then this likelihood component will be
                      small.
                      Similarly, we can discuss the penalty component of AIC which is,


                                                       2  ,

                      where    is the number of    parameters in the model. We call this a penalty,
                      because it is large when    is large, but we are seeking to find a small AIC
                      Thus, a good model, that is one with a small AIC, will have a good balance
                      between fitting well, and using a small number of parameters. For comparing
                      models


                                                          RSS
                                              AIC =    log (  ) + 2  
                                                             
                      is a sufficient expression, as    +    log(2  ) is the same across all models for any
                      particular dataset.



                      16.1.2    Bayesian Information Criterion

                      The Bayesian Information Criterion, or BIC, is similar to AIC, but has a larger
                      penalty. BIC also quantifies the trade-off between a model which fits well and
                      the number of model parameters, however for a reasonable sample size, generally
                      picks a smaller model than AIC. Again, for model selection use the model with
                      the smallest BIC.



                                        ̂
                                          2
                        BIC = −2 log   (  , ̂   ) + log(  )   =    +    log(2  ) +    log ( RSS ) + log(  )  .
                                                                              
                      Notice that the AIC penalty was


                                                       2  ,

                      whereas for BIC, the penalty is

                                                     log(  )  .

                      So, for any dataset where       (  ) > 2 the BIC penalty will be larger than the
                      AIC penalty, thus BIC will likely prefer a smaller model.
                      Note that, sometimes the penalty is considered a general expression of the form
   380   381   382   383   384   385   386   387   388   389   390