Individual internet search history predicts openness, interest, knowledge and intelligence
Tue-B22-Talk V-05
Presented by: Markus J. Hofmann
Here we tested whether individual text corpora can predict big-5 personality traits and compared the resulting virtual with the survey-based openness for the prediction of interests, knowledge and intelligence. Via histories of an internet search engine, we generated individual corpora for 94 participants, with an average of three million word token. We then computed an individual semantic structure for each participant and examined the similarity of this structure to label words, which were adjectives from a well-established lexical approach to big-5 personality traits. A simple linear regression analysis showed that the similarity of the individual semantic structures to the label word “academic” (“gelehrt”), for instance, approaches the meta-analytically reproducible explained variance for survey-based openness in computational social science. We also used a nonlinear neural model based on the 30 best label words to predict the diagnostic assessment of personality. The virtually estimated big-5 provided by far better predictions for the corresponding diagnostic trait than previous computational approaches, while the respective dimension hardly accounted for variance in the other dimensions. The neural models also generalized well, when running them 1000 times using 10-fold cross validation. Virtual openness provided similar predictions for intellectual interests and level of education as the survey-based approach. For fluid intelligence, two diagnostic assessments of crystallized intelligence and particularly for knowledge in humanities, virtual openness even provided slightly better predictions than the survey-based openness. In sum, our machine learning approach answers Cattell’s challenge of freeing adult tests from the assumption of uniform knowledge across participants.
Keywords: Machine learning, word2vec, natural language processing, language models, big-5, intelligence-as-process, Cattell-Horn-Carroll theory