Testing testing: Akaike Information Criterion

Thursday, November 7, 2013

Akaike Information Criterion - Sum of Squares Method

The Akaike Information Criterion (AIC) is a tool that can be used to gauge whether the increased predictive power associated with adding additional parameters to a model is worth the associated increase in complexity. It is primarily based on comparing values from maximum likelihood estimation (MLE) methods. Odds are you aren't using an MLE model. Instead you are probably determining your model using a least squares method such as linear regression. Assuming that the assumptions associated with a parametric model are met (equal variance, normally distributed errors) you can use an equivalent method that uses the residual sum of squares.

AIC= nxLn(RSS/n)+2xK

Where Ln is the natural log, RSS is the residual sum of squares, n is the number of observations that make up the sample, and K is the number of parameters in your model. Most of that should be pretty familiar. The only tricky thing is the number of parameters.

A common mistake when calculating the number of parameters is failure to include error. This is because it is not normally thought of as a parameter as, strictly speaking, you're not really predicting it. As a results the number of parameters in a standard linear equation (y=mx+c) is 3 (mx, c, and error) rather than 2.

It is also recommended to add an additional correction factor if n<40xK. As it's not much trouble to calculate it's best to use the corrected version of AIC (AICc).

AICc=AIC+(2xK(K+1))/(n-K-1)

The model with the lowest AICc is the model that provides the best trade-off between the complexity of the model and its ability to explain the data. The numbers themselves are meaningless on their own and offer no insight into whether the difference between models is significant. However if you have 2 AICc values you can use them to find out the likelihood that the models are equally good.

exp(AICc(low)-AICc(high))/2)

A key advantage of the AICc is that it allows you to test as many models as you like without taking a penalty for the number of models tested. The key downside (as already noted) is that it does not tell you about whether there is a significant difference in the variance explained between models. If that is what you are interested in an F change statistic is more appropriate.

A general overview of the AIC can be found here
http://en.wikipedia.org/wiki/Akaike_information_criterion

More specific coverage can be found here:
http://www.mun.ca/biology/quant/ModelSelectionMultimodelInference.pdf
(Burnham and Anderson (2002). Model Selection and Multimodel Inference. Springer. Colorado:USA)

Testing testing

Pages

Thursday, November 7, 2013

Akaike Information Criterion - Sum of Squares Method

0 comments:

Post a Comment