Testing LLMs for Causal Extraction in Political Text

P5-S125-2

Presented by: Paulina Garcia-Corral

Paulina Garcia-Corral ¹, Hannah Béchara ¹, Slava Jankin ²

¹ Hertie School

² Hertie School

³ University of Birmingham

In this paper, we evaluate the capabilities of Large Language Models (LLMs) for causal extraction within the domain of political text. Using a political corpus annotated for cause and effect (PolitiCause), we analyze the performance of LLMs in two subtasks: Causal Sequence Classification (CSC) and Causal Span Detection (CSD). We find that, while using LLMs to classify and extract domain-specific data produces overall good results, smaller models, when properly fine-tuned, still outperform their LLM counterparts at causal tasks. Additionally, we find that fine-tuning some LLMs can hinder performance. We also experiment with sentence sets that test the role of training data in performance and find that GPT-4o can understands the task as a linguistic task, while LLama-3.1 struggles with the difference of linguistic causality and real-world causality. Our experimental results substantiate the efficacy of LLMs in extracting causal relationships, while generating a deeper understanding on causal reasoning in language models, and the downstream task possibilities they offer.

Keywords: causal attribution, causal language, LLM, text-as-data

Sponsors