Relating Frequentist and Bayesian Hypothesis Tests Using Regions of Support

Tue-Main hall - Z3-Poster 2-5913

Presented by: Frieder Göppert

Frieder Göppert, Sascha Meyen, Volker Franz

Department of Computer Science, University of Tübingen, Tübingen, Germany

Hypothesis testing is a widely used tool in inferential statistics. Yet, hypothesis tests have their problems and can cause sever misinterpretations. We argue, that one reason for these persistent problems is the following discrepancy: While hypothesis tests are explicit on which parameter-values are theoretically contained in each hypothesis, they are usually not explicit on which parameter-values would in a practical setting lead most likely to which test outcome. For example, a standard t-test explicitly tests against a zero effect under the null hypothesis. However, in practical settings the test will also likely not refute small 'true' effects (depending on sample size). To make these test-characteristics explicit we introduce the concept of Regions of Support (ROS) which indicates which 'true' effects likely result in which outcome of a hypothesis test given the sample size (or, more generally, precision). ROS can serve both as a check for researchers’ expectations as well as a comparison of different tests. We evaluate standard Bayesian and frequentist point-null tests as well as interval (equivalence) tests on a simple, two independent samples setting. Interestingly, for interval tests our ROS analysis finds that Bayes factors (which have previously been considered as being the best choice) suffer from an undesirable bias towards accepting the null hypothesis. We argue that other methods such as the Bayesian highest density interval (HDI) with explicit specification of the region of practical equivalence (ROPE) or its frequentist analogue (confidence interval with ROPE) do not show this bias and might be preferable.

Keywords: Hypothesis testing, Equivalence testing, Bayesian, frequentist, Bayes Factor, HDI+ROPE, Effect sizes