Large language models for terminology work: A question of the right prompt?

Barbara Heinisch

doi:10.21248/jlcl.38.2025.280

Authors

Barbara Heinisch Eurac Research - Institute for Applied Linguistics https://orcid.org/0000-0002-1362-4088

DOI:

https://doi.org/10.21248/jlcl.38.2025.280

Keywords:

prompt engineering, definition generation, university terminology

Abstract

Text-generative large language models (LLMs) offer promising possibilities for terminology work, including term extraction, definition creation and assessment of concept relations. This study examines the performance of ChatGPT, Perplexity and Microsoft CoPilot for conducting terminology work in the field of the Austrian and British higher education systems using strategic prompting frameworks. Despite efforts to refine prompts by specifying language variety and system context, the LLM outputs failed to reliably differentiate between the Austrian and German systems and fabricated terms. Factors such as the distribution of German-language training data,
potential pivot translation via English and the lack of transparency in LLM training further complicated evaluation. Additionally, output variability across identical prompts highlights the unpredictability of LLM-generated terminology. The study underscores the importance of human expertise in evaluating LLM outputs, as inconsistencies may undermine the reliability of terminology derived from such models. Without domain-specific knowledge (encompassing both subject-matter expertise and familiarity with terminology principles) as well as LLM literacy, users are unable to critically assess the quality of LLM outputs in terminological contexts. Rather than indiscriminately applying LLMs to all aspects of terminology work, it is crucial to assess their suitability for specific tasks.

Large language models for terminology work: A question of the right prompt?

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License