Discourse Segmentation of German Text with Pretrained Language Models
DOI:
https://doi.org/10.21248/jlcl.39.2026.306Keywords:
discourse, segmentation, German, Large Language ModelsAbstract
Segmenting text into so-called "elementary discourse units" (EDUs) is a task that is relevant for several NLP applications, including discourse parsing or argument mining. In recent years, EDU segmentation has been addressed as part of a shared task on multilingual discourse parsing ("DISRPT"), where BERT-based encoder models proved particularly successful. The German language has been represented in DISRPT with the Potsdam Commentary Corpus, but recently, more German data with EDU segmentation has been published. In this paper, we conduct detailed tests on the German-language datasets that are currently available. We test a multilingual off-the-shelf model, several BERT-based encoders, and the current generation of LLMs. The results are analyzed both qualitatively and quantitatively and are compared to the multilingual state-of-the-art. We are making the best-performing model available as a tool that can be used by the community.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Steffen Frenzel, Maximilian Krupop, Manfred Stede

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.