Email) (1), Peter Struijs, (Email) (2), Galya Stateva"> Email)">
14:30 - 15:30
Special Topic Session
Room: MANS
Chair:
August Goetzfried, Eurostat, Luxembourg, (Email)
Organiser:
Dimitar Nenkov, European Commission DG ESTAT, Luxembourg, (Email)
Estimating Enterprise Characteristics from Web Data: Achievements and Future Developments
Monica Scannapieco, (Email) 1, Peter Struijs, (Email) 2, Galya Stateva, (Email) 3
1 ISTAT, Rome
2 CBS, Hague
3 BNSI, Sofia
Internet is one of the most interesting Big Data sources for Official Statistics. Indeed, while for other sources, like mobile phone data or smart meters, there is the need to engage partnerships with their providers, Internet data are publicly accessible. Internet as a Data Source (IaD) data can be used in substitution or in combination with data collected by means of traditional survey-based instruments. In case of substitution, the aim is to reduce respondent burden, in case of integration the increase in accuracy of the estimates is the main goal. Among the possible uses of IaD, data from enterprise websites are particularly relevant for Official Statistics. During the last few years, the vast majority of enterprises acquired an Internet domain in order to set up an official website, thus making available (almost) for free several information that previously was available only via traditional collection systems. Hence, it is recognized as an opportunity for National Statistical Institutes to collect and to mine the publicly available information on these websites to describe a wide range of phenomena in near real-time. Given this context, the ESSnet Big Data Pilots I was launched by Eurostat early 2016 and is concluded in July 2018. Within such a project, the purpose of workpackage “Web Scraping of Enterprise Web Sites” was to investigate whether web scraping, text mining and inference techniques could be used to collect, process and improve general information about enterprises. The project will have a follow-up, namely the ESSnet Big Data Pilots II that includes again a specific workpackage, “Enterprise Characteristics”, aiming at conducting to an implementation stage the piloting activities carried out within the first ESSnet project. In this paper, we first summarize the results achieved within the first project strand, then, we will highlight the main developments foreseen for the future project activities.


Reference:
STS01-001
Session:
New data sources for MultiNational Enterprises
Presenter/s:
Peter Struijs
Presentation type:
Oral presentation
Room:
MANS
Chair:
August Goetzfried, Eurostat, Luxembourg, (Email)
Date:
Tuesday, 12 March
Time:
14:30 - 15:30
Session times:
14:30 - 15:30