Comparing Frequentist and Bayesian Hypothesis Tests Using Regions of Decision

Submission 292

MixedTopicTalk-04

Presented by: Frieder Göppert

Frieder Göppert, Sascha Meyen, Volker H. Franz

Department of Computer Science, University of Tübingen, Germany

Statistical hypothesis testing is heavily used in inferential statistics. Yet, hypothesis tests—be they frequentist or Bayesian—have their problems and can cause severe misinterpretations. We argue that one reason for these persistent problems is the following discrepancy: While hypothesis tests are explicit on which effect sizes are theoretically contained in each hypothesis, they usually do not explicitly state which 'true' effect sizes would lead most likely to which test outcome in a practical setting with a certain sample size, measurement error, and between-subjects variability. We make these characteristics explicit by introducing Regions of Decision (RODs). RODs indicate which 'true' effects most likely result in which outcome of a test under specified conditions and are derived from the probabilities of all outcomes of a test. Thus, RODs generalize statistical power calculations to tests with more than two outcomes. RODs allow to quickly gauge which true effects most likely lead to accepting H0, accepting H1, or to an indecisive outcome. Moreover, RODs provide a basis to evaluate and compare tests. Using RODs, we (a) demonstrate a problem of frequentist but also Bayesian point-null tests, which is known as Meehl's paradox, (b) show that interval Bayes factors—which have previously been championed—suffer from an undesirable bias towards the equivalence hypothesis, and (c) argue that other tests, such as the Bayesian highest density interval with region of practical equivalence (HDI-ROPE) or its frequentist analogue based on confidence intervals (CI-ROPE) might be preferable, because they do not show this bias.