13:45 - 14:45
We-IPS06
Chair:
Bernardus Bakker (CBS, Netherlands), Christine Mhundwa (Journalist, Zimbabwe), Heli Lehtimäki (EC-Eurostat, Luxembourg)
Selective Data Editing of Continuous Variables with Random Forests in Official Statistics
Sarah Bohnensteffen, (Email)
Complutense University Madrid
Federal Statistical Office Germany

Technological advances and new demands due to economic and socio-cultural changes regularly challenge the National Statistical Institutes to adapt to their evolving environment. In the context of these changes, machine learning methods are discussed as important and promising tools for official statistics. Selective statistical data editing is an approach to detect influential units and select them for manual follow up in order to make data processing more efficient. In this thesis, a simple and a two-step approach are developed to apply random forests to selective editing of continuous variables in the context of short-term business survey data.