Multiple imputation for life-course sequence data

HALPIN, BRENDAN

Halpin_2013_multiple.pdf (1023.08 kB)

Multiple imputation for life-course sequence data

report

posted on 2014-01-28, 13:51 authored by BRENDAN HALPINBRENDAN HALPIN

As holistic analysis of life-course sequences becomes more common, using optimal matching (OM) and other approaches the problem of missing data becomes more serious. Longitudinal data is prone to missingness in ways that cross-sectional is not. Existing solutions (e.g., coding for gaps) are not satisfactory, and deletion of gappy sequences causes bias. Multiple imputation seems promising, but standard implementations are not adapted for sequence data. I propose and demonstrate a Stata implementation of a chained multiple imputation procedure that “heals” gaps from both ends, taking account of the longitudinal nature of the measured information, and also constraining the imputations to respect this longitudinality. Using the sequence data alone, without auxiliary individual-level information, stable imputations with good characteristics are generated. Using additional information about the structure of data collection (which relates to mechanisms of missingness) gives better prediction models, but imputations that differ only subtly. Many sequence analysts proceed by cluster analysis of the matrix of pairwise OM distances between sequences. As a non-inferential procedure, this does not benefit from “Rubin’s Rules” for multiple imputation in averaging across estimations. I explore ways of clustering with multiplyimputed sequences that allow us to assess the variability due to imputation. I compare the results with an existing approach that codes gaps with a special missing value that is maximally different from all other states, and show that imputation performs better. In an example data set drawn from BHPS work-life histories, imputation of short internal gaps ( 12 months) increases the available sample size by approximately 25 percent. Moreover, the gappy sequences have a distinctly different distribution, with higher numbers of transitions, so deletion of gappy sequences distorts the sample badly. For typical longitudinal data sets, we can expect missingness to be related to the amount of instability in the career, and to proceed without imputation will cause serious bias.

History

Publication

University of Limerick Department of Sociology Working Paper Series;WP2012-01

Publisher

Department of Sociology, University of Limerick

Note

non-peer-reviewed

Language

English

External identifier

http://www.ul.ie/sociology/pubs/

Usage metrics

Keywords

longitudinal data sequence analysts optimal matching

Licence

CC BY-NC-SA 1.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Multiple imputation for life-course sequence data

History

Publication

Publisher

Note

Language

External identifier

Usage metrics

Categories

Keywords

Licence

Exports