Notice the noise: detecting misclassifications in register data
Label noise is present in many registers and databases. The current study proposes a method where auxiliary information is used to identify whether units in a large noisy data set are misclassified or not. We illustrate the method with the NACE classification of enterprises in the general business register (GBR). We conducted experiments to demonstrate under which circumstances the proposed method is (not) able to find the misclassified units. In the experiments, we took non-random label noise and the possible interchangeability between classes into account. Experimental results are promising: the proposed method identifies the misclassified units accurately.