Email), Gianpiero Bianchi, (Email), Monica Consalvi, (Email), Barbara Gentili, (Email), Flavio Pancella, (Email), Francesco Scalfati">
12:30 - 13:30
Poster Session
Room: Lunches Space
Using Big Data for Official Statistics: Web Scraping as a Data Source for Statistical Business Registers (SBRs)
Donato Summa, (Email), Gianpiero Bianchi, (Email), Monica Consalvi, (Email), Barbara Gentili, (Email), Flavio Pancella, (Email), Francesco Scalfati, (Email)
Istat, Rome
The approach of the Italian National Institute of Statistics (Istat) with respect to the new complexity of both phenomena and data has been to adopt new strategies to integrate data from traditional surveys, administrative bodies and innovative sources such as Big Data. The aim is to reduce the statistical burden on respondents while enriching the offer, the quality and the timeliness of the information produced, always having in mind that statisticians working in a NSI should be at the same time researchers but also producers, and should always guarantee the quality of official statistics. Accordingly, a project was launched for the enlargement of the informative content of the SBR to provide concrete support for statistical production, taking advantage of the opening of the new Istat Laboratory for Innovation (LabInn) that provides useful infrastructures to strategic research projects in a dedicated physical and technological space. To proceed in a structured and integrated manner, a register-based approach to the Big Data was chosen, placing the register at the centre. The main idea was to use Big Data as an additional source in the SBR updating process, through web scraping and text mining technologies, with the aim of integrating the ‘structured’ business data with the ‘unstructured’ data coming from web pages. Furthermore the new information on enterprises will be used to start a more detailed statistical analysis, finding new classifications and new taxonomies to support a better interpretation of new emerging economic phenomena.


Reference:
POST01-009
Session:
Big data analytics (poster)
Presenter/s:
Donato Summa
Presentation type:
Poster presentation
Room:
Lunches Space
Date:
Tuesday, 12 March
Time:
12:30 - 13:30
Session times:
12:30 - 13:30