Discourse Segmentation of German Text with Pretrained Language Models

Authors

DOI:

https://doi.org/10.21248/jlcl.39.2026.306

Keywords:

discourse, segmentation, German, Large Language Models

Abstract

Segmenting text into so-called "elementary discourse units" (EDUs) is a task that is relevant for several NLP applications, including discourse parsing or argument mining. In recent years, EDU segmentation has been addressed as part of a shared task on multilingual discourse parsing ("DISRPT"), where BERT-based encoder models proved particularly successful. The German language has been represented in DISRPT with the Potsdam Commentary Corpus, but recently, more German data with EDU segmentation has been published. In this paper, we conduct detailed tests on the German-language datasets that are currently available. We test a multilingual off-the-shelf model, several BERT-based encoders, and the current generation of LLMs. The results are analyzed both qualitatively and quantitatively and are compared to the multilingual state-of-the-art. We are making the best-performing model available as a tool that can be used by the community.

Downloads

Published

2026-02-02

How to Cite

Frenzel, S., Krupop, M., & Stede, M. (2026). Discourse Segmentation of German Text with Pretrained Language Models. Journal for Language Technology and Computational Linguistics, 39(1), 1–31. https://doi.org/10.21248/jlcl.39.2026.306

Issue

Section

Research articles