Large language models for terminology work: A question of the right prompt?
DOI:
https://doi.org/10.21248/jlcl.38.2025.280Keywords:
prompt engineering, definition generation, university terminologyAbstract
Text-generative large language models (LLMs) offer promising possibilities for terminology work, including term extraction, definition creation and assessment of concept relations. This study examines the performance of ChatGPT, Perplexity and Microsoft CoPilot for conducting terminology work in the field of the Austrian and British higher education systems using strategic prompting frameworks. Despite efforts to refine prompts by specifying language variety and system context, the LLM outputs failed to reliably differentiate between the Austrian and German systems and fabricated terms. Factors such as the distribution of German-language training data,
potential pivot translation via English and the lack of transparency in LLM training further complicated evaluation. Additionally, output variability across identical prompts highlights the unpredictability of LLM-generated terminology. The study underscores the importance of human expertise in evaluating LLM outputs, as inconsistencies may undermine the reliability of terminology derived from such models. Without domain-specific knowledge (encompassing both subject-matter expertise and familiarity with terminology principles) as well as LLM literacy, users are unable to critically assess the quality of LLM outputs in terminological contexts. Rather than indiscriminately applying LLMs to all aspects of terminology work, it is crucial to assess their suitability for specific tasks.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Barbara Heinisch

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.