Email) (1), Bart Bakker, (Email) (1,2), Daniel Oberski, (Email) (3), Dimitris Pavlopoulos"> Email)">
11:30 - 12:30
Contributed Paper Session
Room: JENK
Chair:
Roeland Beerten, Statistics Flanders, Belgium, (Email)
Discussant:
Faiz Alsuhail, Statistics Finland, Finland, (Email)
Integration of inconsistent data sources using Hidden Markov Models
Paulina Pankowska, (Email) 1, Bart Bakker, (Email) 1, 2, Daniel Oberski, (Email) 3, Dimitris Pavlopoulos, (Email) 1
1 Vrije Universiteit Amsterdam, Amsterdam
2 Statistics Netherlands, The Hague
3 Utrecht University, Utrecht
National Statistical Institutes (NSI’s) increasingly obtain information on the same phenomena from different sources. These sources, however, despite official statisticians’ best efforts, often provide inconsistent estimates. These inconsistencies occur primarily as a result of measurement error. An attractive solution which could be applied to this problem, in the context of categorical data, is latent class modelling (LCM). In this method, the problems of data reconciliation and measurement error are solved simultaneously by linking two or more sources and modeling them as conditionally independent measures of an underlying true value. A specific group of latent class models applied to longitudinal data specifically are Hidden Markov Models (HMMs). While HMMs, serve as an attractive solution to the problems of discrepancy across longitudinal data sources, several issues need to be considered before they can be utilized in the production of official statistics. First, the procedures involved in applying and estimating HMMs are very complicated, time-consuming and expensive and, therefore, cannot be applied regularly. Thus, is it desirable to re-use HMMs estimates from previous time points with more recent data. This procedure may produce accurate estimates only if measurement error is time invariant. Second, as the method requires data linkage, it might lead to linkage error – a new potential source of bias. Therefore, there is also a need to examine the sensitivity of HMMs estimates to linkage error. In our research, we test the feasibility of using HMMs as a way to reconciliate different sources which measure the same phenomenon given the challenges outlined above. In doing so we apply an extended, two-indicator HMM to Dutch data on transitions from temporary to permanent employment coming from the Labour Force Survey (LFS) and the Employment Register (ER). Our results cast a positive light on the feasibility of using HMMs in official statistics production. Namely, we show that it is possible to re-use parameter estimates in more recent data, provided that the error parameters are time invariant. We also demonstrate that the sensitivity of the method to linkage error is rather low. Finally, we also illustrate how HMMs can be used to evaluate the effectiveness of various data collections techniques in producing accurate statistics.


Reference:
CPS05-004
Session:
Data linking and statistical matching
Presenter/s:
Dimitris Pavlopoulos
Presentation type:
Oral presentation
Room:
JENK
Chair:
Roeland Beerten, Statistics Flanders, Belgium, (Email)
Date:
Wednesday, 13 March
Time:
11:30 - 12:30
Session times:
11:30 - 12:30