12:30 - 13:30
Poster Session
Room: Lunches Space
Supervised Learning as a Method to Reduce Clerical Effort
Joerg Feuerhake, (Email)
Federal Statistical Office Germany, Wiesbaden
With the availability of more and more computing power Machine Learning methods become more relevant in the production of statistics. One important field of application is the classification of statistical units based on models trained with units, where the classification is known. In this paper an approach to classify units from a business database based on prior clerical review is presented. The goal is to remarkably reduce clerical effort in the statistic’s production process. Consider the case where a share of roughly 2% of a population about 600.000 units is not relevant for the results of a certain annual statistic. There are several reasons for a unit to become irrelevant for the statistic and the reasons depend on items like size, economic activity and other rationally and nominally scaled variables. Additionally assume that a unit’s relevance in recent periods was controlled by clerical review. So each year all units entering the population or changing in an important variable have to be checked manually. There are on average 40.000 units each year that previously needed clerical review. Thus the staff bound by these reviews was considerable, let alone the training to enable staff members to review cases correctly and the time needed to do the reviews. In the presented project, methods of supervised learning are applied to achieve the above mentioned goals. Random Forests and Support Vector Machines (SVM) are trained in a combined approach based on populations of prior years to get models that would be able to predict the units that enter the population or change in important variables.


Reference:
POST01-017
Session:
Big data analytics (poster)
Presenter/s:
Joerg Feuerhake
Presentation type:
Poster presentation
Room:
Lunches Space
Date:
Tuesday, 12 March
Time:
12:30 - 13:30
Session times:
12:30 - 13:30