The Metacognitive Prompting Effect of 'Dr. Bayes': A Feasibility Study on the the Implementation and Logic of a Constrained AI Tutor for Guided Reasoning in a STEM Class

Submission 227

Presented by: Claudia Mirenghi

Claudia Mirenghi ¹, Antonella Lopez ², Andrea Bosco ¹

¹ Department of Educational Sciences, Psychology, Communication, University of Bari, Italy

² Department of Humanities, Social Sciences, and Education, University of Molise, Italy

The growing integration of Generative Artificial Intelligence (GenAI) systems within higher education has increased interest in conversational agents designed to support reflective and structured reasoning processes rather than simple answer delivery. Within Artificial Intelligence in Education (AIED), metacognitive prompting has emerged as a promising strategy for guiding learners through intermediate reasoning steps during complex problem-solving activities. This approach may be particularly relevant in STEM-related domains embedded within non-STEM curricula, such as psychometrics education. Aims The present study aimed to develop and evaluate the feasibility of “Dr. Bayes,” a domain-specific GenAI cognitive tutor designed to support Bayesian diagnostic reasoning through an error-tolerant metacognitive prompting architecture. The study focused specifically on implementation feasibility, usability, user experience, and AI-specific satisfaction within a university learning environment. Methods A feasibility-oriented study with a within-subjects design was conducted among master’s students enrolled in a psychometrics course within a non-STEM degree program. Participants completed two chatbot-assisted training sessions involving Bayesian diagnostic reasoning tasks derived from psychometric assessment scenarios. The tutoring system was implemented as a constrained conversational environment in which learners were guided through structured prompts, reflective questioning, and stepwise reasoning support. Outcomes focused exclusively on feasibility and implementation domains, including recruitment and retention rates, task completion, system usability, user experience, and satisfaction with the AI-based tutor. Results Findings supported the operational feasibility of the intervention within the educational context. Recruitment reached 60%, adherence to the two-session protocol was high, and questionnaire completion rates reached 100%, suggesting that the implementation procedures were manageable and well tolerated. System evaluation measures indicated consistently high usability, with System Usability Scale (SUS) scores exceeding benchmarks for excellent usability across both sessions. User experience indicators also showed stable and positive evaluations over time, particularly regarding clarity and ease of interaction. AI-specific satisfaction findings suggested that participants primarily valued the system for its usefulness and information quality, while lower anthropomorphic ratings indicated that the chatbot was perceived mainly as a functional cognitiveThe increasing use of AI-based chatbots in higher education highlights their potential as cognitive tutors when designed to support metacognitive processes rather than merely provide correct answers. Concurrently, cognitive psychology suggests that learning is enhanced through engagement with errors, although most AI systems are optimized to minimize them. Aim: Design and evaluation of an error-oriented cognitive tutor for metacognitive scaffolding in Bayesian reasoning. Methods A feasibility study with a within-subjects design was conducted among master’s students of a STEM class in a non-STEM course. (N = 13). Participants completed two chatbot-assisted sessions involving complex psychometric tasks. Primary outcomes focused on implementation feasibility (recruitment, completion rates) and system evaluation (usability, UX, and satisfaction). Secondary outcomes assessed learning effectiveness through error rate reduction (log-file analysis) and an exploratory between-groups comparison of final academic examination scores against a non-user control group. Results: Implementation was highly feasible (60% recruitment, 100% questionnaire completion). System evaluation indicated "excellent" usability (SUS > 82) and positive, stable user experience (UEQ) across sessions. Attitudes remained stable. Conclusion: The findings suggest that an error-oriented cognitive tutor is feasible and acceptable.