Predictive performance of a hybrid technique for the multiple imputation of survey data
Analysis of data for scientific investigations becomes complicated, biased and less efficient in presence of missing information. In recent decades, lots of effort has been made in development of statistical methods to carter missing data. In many survey based studies, the logistic regression model is used to investigate the effect of various background characteristics (e.g. demographics, age, education, motherhood and recent births etc.) on a binary outcome variable such as breast feeding practices. This model can be difficult to apply when the confounding variables are missing.A popular chained equations model MI approach called Multivariate Imputation by Chained Equations (MICE) fails to perform sometimes due to computational efficiency, complex dependency structure among categorical variables and high percentage of missing information in large scale survey data.We develop a Hybrid Multiple-Imputation (HMI) approach for handling data for the problem described above. The proposed missing data imputation approach is a 3-stage approach. The relationship between binary response (Ever breastfeed) and explanatory variables is modelled using a generalized linear model (GLM). The accuracy of predictive distributional model is assessed by the area under the receiver operating characteristic (ROC) curve, known as (AUROC) and the results obtained under purposed and existing MI methods for large spectrum of data characteristics are compare.Better predictive performance with minimum computational time as compared to the existing methods is partly achieved in simulation studies.
Reference:
STS07-004
Session:
Multisource statistics
Presenter/s:
Humera Razzak
Presentation type:
Oral presentation
Room:
MANS
Chair:
Matyas Meszaros, Eurostat, Luxembourg, (Email)
Date:
Wednesday, 13 March
Time:
16:00 - 17:00
Session times:
16:00 - 17:00