15:30 - 17:00
Mon—Casino_1.811—Poster1—21
Mon-Poster1
Room:
Room: Casino_1.811
Is Optional Stopping Really No Problem for Bayesians?
Mon—Casino_1.811—Poster1—2101
Presented by: Frieder Göppert
Frieder Göppert *Linus SzillatSascha MeyenVolker H. Franz
Department of Computer Science, University of Tübingen, Tübingen, Germany
Optional Stopping describes the practice to stop data collection dependent on properties of the data already collected. Such a practice can be especially beneficial when samples are costly. However, standard frequentist tests lose their error rate guarantees and optional stopping therefore can constitute a form of p-hacking. Bayesians, on the other hand, often argue that optional stopping is no problem for them (Rouder, 2014, Psychonomic Bulletin & Review). While these arguments are typically based on simulations with sampling from the specified prior, we investigated a real-world scenario where there exists a certain, fixed effect: Using a standard Bayesian t-test (which assumes a zero effect for H0 and a Cauchy distribution for H1) with optional stopping at pre-specified Bayes Factor thresholds (e.g. 1/3 and 3), we simulated data for true effects of different, fixed sizes. We then evaluated error rates: how often Bayesian optional stopping decides for H0 despite simulating from a non-null effect and how often it decides for H1 despite simulating from a null-effect. Notably, we find that for small non-null true effects the error rates of the Bayesian t-test are considerably higher than suggested by the critical Bayes Factor thresholds used for optional stopping. We therefore argue Bayesians (if they care about error rates) should actually be very cautious when applying optional stopping in combination with default priors, as used in the Bayesian t-test. This is so, because optional stopping could provide an exploitable opportunity to “hack” not only frequentist but also certain Bayesian tests.
Keywords: Optional Stopping, Sequential Testing, Bayesian, Bayes Factor, Hypothesis Testing, Bayesian t-test, Error rates