15:15 - 16:00
Room:
Chair/s:
Tanja Burgard
Assessing the performance of machine learning algorithms for systematic reviews and meta-analyses: A benchmarking and evaluation project
Tue-03
Presented by: Diego Campos
Diego Campos *, Tim Fütterer , Rosa Lavelle-Hill , Thomas Gfrörer , Lars König , Steffen Zitzmann , Martin Hecht , Kou Murayama , Ronny Scherer
Natural language processing (NLP) algorithms are a form of artificial intelligence (AI) that have the potential to make the abstract screening process in academic systematic reviews more efficient and reliable. Previous simulation studies evaluating the performance of AI-based tools for abstract screening suggest that they can save significant time in the screening process and reduce the screening fatigue that leads to false positives and false negatives in systematic reviews. However, the performance of such AI tools varies across research areas, and it is uncertain how many articles should be screened to find all relevant publications to be included in a systematic review in educational research. In this study, we evaluate the performance of different NLP algorithms and heuristic stopping criteria for retrieving all relevant records from systematic reviews in educational research. We collected 30 systematic reviews in education and educational psychology and simulated the abstract screening process using the ASReview software. The simulation study was run for each dataset using all combinations of classifiers and feature extraction strategies supported in the ASReview software, which resulted in 12 different models. We then calculated performance metrics such as recall, work saved over sampling, and estimated time savings to compare the performance of AI-based NLP algorithms and heuristic stopping criteria. The results from this study will provide educational researchers with a clearer understanding of the performance of AI-based NLP algorithms and heuristic stopping criteria. This study hopes to contribute to the development of procedures for incorporating AI tools into the abstract screening process of systematic reviews in Education.