Page 385 - Applied Statistics with R

P. 385

16.1. QUALITY CRITERION 385

terms will be constant across all models applied to the same data. So, when a
model fits well, that is, has a low RSS, then this likelihood component will be
small.
Similarly, we can discuss the penalty component of AIC which is,

2 ,

where is the number of parameters in the model. We call this a penalty,
because it is large when is large, but we are seeking to find a small AIC
Thus, a good model, that is one with a small AIC, will have a good balance
between fitting well, and using a small number of parameters. For comparing
models

RSS
AIC = log ( ) + 2

is a suﬀicient expression, as + log(2 ) is the same across all models for any
particular dataset.

16.1.2 Bayesian Information Criterion

The Bayesian Information Criterion, or BIC, is similar to AIC, but has a larger
penalty. BIC also quantifies the trade-off between a model which fits well and
the number of model parameters, however for a reasonable sample size, generally
picks a smaller model than AIC. Again, for model selection use the model with
the smallest BIC.

̂
2
BIC = −2 log ( , ̂ ) + log( ) = + log(2 ) + log ( RSS ) + log( ) .

Notice that the AIC penalty was

2 ,

whereas for BIC, the penalty is

log( ) .

So, for any dataset where ( ) > 2 the BIC penalty will be larger than the
AIC penalty, thus BIC will likely prefer a smaller model.
Note that, sometimes the penalty is considered a general expression of the form

380 381 382 383 384 385 386 387 388 389 390