As holistic analysis of life-course sequences becomes more common, using
optimal matching (OM) and other approaches the problem of missing data
becomes more serious. Longitudinal data is prone to missingness in ways
that cross-sectional is not. Existing solutions (e.g., coding for gaps) are not
satisfactory, and deletion of gappy sequences causes bias. Multiple imputation
seems promising, but standard implementations are not adapted
for sequence data. I propose and demonstrate a Stata implementation of a
chained multiple imputation procedure that “heals” gaps from both ends,
taking account of the longitudinal nature of the measured information, and
also constraining the imputations to respect this longitudinality. Using the
sequence data alone, without auxiliary individual-level information, stable
imputations with good characteristics are generated. Using additional
information about the structure of data collection (which relates to mechanisms
of missingness) gives better prediction models, but imputations that
differ only subtly.
Many sequence analysts proceed by cluster analysis of the matrix of
pairwise OM distances between sequences. As a non-inferential procedure,
this does not benefit from “Rubin’s Rules” for multiple imputation in
averaging across estimations. I explore ways of clustering with multiplyimputed
sequences that allow us to assess the variability due to imputation.
I compare the results with an existing approach that codes gaps with
a special missing value that is maximally different from all other states, and
show that imputation performs better.
In an example data set drawn from BHPS work-life histories, imputation
of short internal gaps ( 12 months) increases the available sample
size by approximately 25 percent. Moreover, the gappy sequences have
a distinctly different distribution, with higher numbers of transitions, so
deletion of gappy sequences distorts the sample badly. For typical longitudinal
data sets, we can expect missingness to be related to the amount
of instability in the career, and to proceed without imputation will cause
serious bias.
History
Publication
University of Limerick Department of Sociology Working Paper Series;WP2012-01