Post-truth has emerged as a popular term, referring to a particular way information has been presented to the public. According to the definition given by the Oxford dictionary, it refers to a situation in which objective facts are being set aside to more emotionally shaped information. The role of official statistics in such circumstances is under threat, as it has been previously pointed by Baldacci et al. [1]. In their paper, official statistics is opposed to fake news, describing the possible relations between the two and the future actions that need to be taken.
Such practices of online media are harmful for the data dissemination as it threatens to jeopardise the trust in the official statistical sources. In order to prevent the harmful effects from these actions, the aim of the paper is to present a clickbait-detecting model, using data from all the headlines of articles containing press release information issued by the Bulgarian NSI from 21 media websites for 2017. Two models for clickbait detection are compared: the first one using the bag-of-words for natural language processing and the second one using the method applied by Wei et al. [3] which uses type labels to frame the main features which a clickbait is containing, but for the purposes of the paper they have been converted into parts of speech. The reason why these approaches are chosen is that the former is considered as easy to implement and simple, and the latter employs to the most common features that a clickbait has – its dynamics, pathos and expression which can be detected by the parts of speech used. As the dataset is rather small and unbalanced in terms of share of clickbait vs. non-clickbait headlines (the former are fewer) a support vector machine (SVM) classifier was used. The results show the superiority of the Parts of speech model, which is accurate in 92% of the cases, compared to the Bag-of-words model which predicted correctly 67% of the cases tested.