Research into using alternative data sources in the production of consumer price indices, ONS
Alternative data sources such as web scraped and point of sale scanner price datasets are becoming more commonly available, providing large sources of price data from which measures of consumer inflation could potentially be calculated. The ONS has been carrying out research into these data sources since 2014. ONS has recently acquired a robust source of web scraped data from a third-party supplier and are continuing to pursue scanner data.
Given this progress with acquiring alternative data sources, ONS has started a new stage of research to sketch out a proposed end to end pipeline, comprised of individual modules required to process the data, for example ‘classification’. For each module, we have looked at the different methods that could be used, and how they may differ for the different data sources. In practice, this means that we need a pipeline that takes the raw input data, processes it, and outputs item level indices which are required as inputs into a final production platform.
One of the major obstacles with this pipeline is the product churn (the volume of products entering and leaving the sample). Methods to define suitable clusters of homogeneous products are seen as a way of solving this problem however they remain an open question in the international research at the moment.
This presentation will touch on the modules required to create item level indices from big datasets, before focusing on the clustering large datasets into price indices (CLIP) approach developed by ONS as a way of solving the issues associated with high product churn.
Reference:
IPS02-004
Session:
Big Data and Consumer Price statistics
Presenter/s:
Tanya Flower
Presentation type:
Oral presentation
Room:
GASP
Chair:
DJ Hoogerdijk, ESTAT, Luxembourg, (Email)
Date:
Tuesday, 12 March
Time:
16:00 - 17:00
Session times:
16:00 - 17:00