Page 385 - Applied Statistics with R
P. 385
16.1. QUALITY CRITERION 385
terms will be constant across all models applied to the same data. So, when a
model fits well, that is, has a low RSS, then this likelihood component will be
small.
Similarly, we can discuss the penalty component of AIC which is,
2 ,
where is the number of parameters in the model. We call this a penalty,
because it is large when is large, but we are seeking to find a small AIC
Thus, a good model, that is one with a small AIC, will have a good balance
between fitting well, and using a small number of parameters. For comparing
models
RSS
AIC = log ( ) + 2
is a sufficient expression, as + log(2 ) is the same across all models for any
particular dataset.
16.1.2 Bayesian Information Criterion
The Bayesian Information Criterion, or BIC, is similar to AIC, but has a larger
penalty. BIC also quantifies the trade-off between a model which fits well and
the number of model parameters, however for a reasonable sample size, generally
picks a smaller model than AIC. Again, for model selection use the model with
the smallest BIC.
̂
2
BIC = −2 log ( , ̂ ) + log( ) = + log(2 ) + log ( RSS ) + log( ) .
Notice that the AIC penalty was
2 ,
whereas for BIC, the penalty is
log( ) .
So, for any dataset where ( ) > 2 the BIC penalty will be larger than the
AIC penalty, thus BIC will likely prefer a smaller model.
Note that, sometimes the penalty is considered a general expression of the form

