In the parametric regression models with categorical covariates, it is well
known that many key quantities of interest are invariant to the choice
of reference subclass. However, surprisingly, not all quantities are invariant
and some choices may lead to models which have inferior properties
when judged against particular criteria. We propose a set of secondary
criteria upon which the choice of reference subclass may be based. This,
secondary, set comprises: (a) precision of the estimates, (b) a measure of
multi-collinearity and (c) subject matter considerations. The elements of
this set are clearly inter-related. We explore the development and use of
the proposed criteria in generalized linear models (GLMs) with categorical
covariates. Our approach is based on analysis, simulation studies and
a detailed analysis of a real data set. The results show clearly that it is
possible to improve the characteristics of the model by selecting the reference
subclass judiciously. This findings is based on the close relationship
between the measure of precision of the estimates and the measure of multicollinearity.
So that it is natural to wish to evaluate any choice based on
subject matter considerations in terms of the former two criteria.
Our approach is to develop a measure of the precision of the regression
estimates, Vr, the total variance, and adopt a measure of the condition of
Vr, namely Kr, and to consider the dependence between the pair (Vr,Kr)
as we vary r
History
Publication
29th International Workshop on Statistical Modelling;