When comparing two multiple regression models, which statistic assessing the overall goodness of fit is preferred?
A R-squared average of the slopes
B standard error of slopes
C VIF
D R-squared (adj)
The correct answer and explanation is:
The correct answer is D: R-squared (adjusted).
When comparing multiple regression models, the adjusted R-squared (R² adjusted) is generally preferred as the statistic for assessing the overall goodness of fit. R-squared is a measure of how well the independent variables explain the variability in the dependent variable, with values ranging from 0 to 1. A higher R-squared indicates a better fit. However, a key issue with R-squared is that it always increases as more predictors are added to the model, even if those predictors do not improve the model’s explanatory power.
Adjusted R-squared addresses this limitation by adjusting for the number of predictors in the model. It penalizes the inclusion of unnecessary variables, so it can decrease when irrelevant predictors are added, even if the regular R-squared increases. This makes adjusted R-squared a more reliable measure when comparing models with different numbers of predictors. The formula for adjusted R-squared incorporates both the R-squared value and the number of predictors, ensuring a balance between model complexity and explanatory power.
The other options are useful in specific contexts but are not as appropriate for overall goodness of fit comparison:
- A. R-squared average of the slopes is not a standard measure for comparing model fits. It might refer to individual slopes, but not the overall model.
- B. Standard error of slopes measures the precision of individual coefficient estimates but does not directly assess model fit.
- C. VIF (Variance Inflation Factor) evaluates multicollinearity between predictors, not the goodness of fit of the entire model. High VIF values indicate that predictors are highly correlated, which can lead to unreliable coefficient estimates.
In summary, when comparing regression models with different numbers of predictors, R-squared adjusted is the preferred statistic because it balances model complexity and explanatory power, giving a more accurate representation of how well the model fits the data.