Aggregating flags – a standardised and rational approach
A flag is an attribute of a cell in a data set that provides additional qualitative information about the statistical value of that cell. They can indicate a wide range of information, for example, that a given value is estimated, confidential or represents a break in the time series.
Currently different sets of flags are in use in the European Statistical System (ESS). Some statistical domains use the SDMX code list for observation status and confidentiality status, OECD uses a simplified version of the SDMX code lists and Eurostat uses a short list of flags for dissemination which combines the observation and confidentiality status.
While in most cases it is well defined how a flag shall be assigned to an individual value, it is not straightforward to decide what flag shall be propagated to aggregated values like a sum, an average, quantiles, etc.
This topic is important for Eurostat as the European aggregates are derived from national data points. Thus the information contained in the individual flags need to be summarized in a flag for the aggregate. This issue is not unique to Eurostat, but can occur for any aggregated data. For example, a national statistical institute may derive the national aggregate from regional data sets. In addition, the dissemination process provides further peculiarity: only a limited set of flags, compared to the set of flags used in the production process, can be applied in order to make it easily understandable to the users.
In the scientific community there is a wide range of research about the consequences of data aggregation but it concentrates only on the information loss during aggregation of information and there is no scientific guidance how to aggregate flags. This paper is an attempt to provide a picture about the current situation and provide some systematic guidance how to aggregate flags in a coherent way.
Eurostat is testing various approaches with a view to well balance transparency and clarity of the information made available to users in a flag. From several options, 3 methods (hierarchical, frequency and weighted frequency) are implemented in an R package for assigning a flag to an aggregate based on the underlying flags and values. Since the topic has relevance outside of Eurostat as well, it was decided to publish the respective code with documentation with a view to foster re-use within the European Statistical System and to stimulate discussion, including with the user community.
Reference:
POST02-009
Session:
Advanced estimation techniques
Presenter/s:
Matyas Meszaros
Presentation type:
Poster presentation
Room:
Lunches Space
Date:
Wednesday, 13 March
Time:
12:30 - 13:30
Session times:
12:30 - 13:30