Abstract:Aim/Purpose. The most crucial aspects of teaching a foreign language to more advanced learners are building an awareness of discourse modes, how to regulate discourse, and the pragmatic properties of discourse components. However, in different languages, the connections and structure of discourse are ensured by different linguistic means which makes matters complicated for the learner. Background. By uncovering regularities in a foreign language and comparing them with patterns in one’s own tongue, the corpus research method offers the student unique opportunities to acquire linguistic knowledge about discourse markers. This paper reports on an investigation of the functions of multi-word discourse markers. Methodology. In our research, we combine the alignment model of the phrase-based statistical machine translation and manual treatment of the data in order to examine English multi-word discourse markers and their equivalents in Lithuanian and Hebrew translations by researching their changes in translation. After establishing the full list of multi-word discourse markers in our generated parallel corpus, we research how the multi-word discourse markers are treated in translation. Contribution. Creating a parallel research corpus to identify multi-word expressions used as discourse markers, analyzing how they are translated into Lithuanian and Hebrew, and attempting to determine why the translators made the choices add value to corpus-driven research and how to manage discourse. Findings. Our research proves that there is a possible context-based influence guiding the translation to choose a particle or other lexical item integration in Lithuanian or Hebrew translated discourse markers to express the rhetorical domain which could be related to the so-called phenomenon of “over-specification.” Recommendations for Practitioners. The comparative examination of discourse markers provides language instructors and translators with more specific information about the roles of discourse markers. Recommendations for Researchers. Understanding the multifunctionality of discourse markers provides new avenues for discourse marker application in translation research. Impact on Society. The current study may be a useful method to strengthen students’ language awareness and analytic skills and is particularly important for students specializing in English philology or translation. Beyond the empirical research, an extensive parallel data resource has been created to be openly used. Future Research. It should be noted that the observed phenomenon of “over-specification” could be analyzed further in future research.

When is Wall a Pared and when a Muro? -- Extracting Rules Governing Lexical Selection

From Submit to Submitted via Submission: On Lexical Rules in Large-Scale Lexicon Acquisition

Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis

Linguistic Features Distinguishing Students' Writing Ability Aligned with CEFR Levels

Word Sense Disambiguation in Native Spanish: A Comprehensive Lexical Evaluation Resource

Rule-Based Spanish Morphological Analyzer Built From Spell Checking Lexicon

Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers

Learning class-to-class selectional preferences

Presence or Absence: Are Unknown Word Usages in Dictionaries?

THE STORAGE AND PROCESSING OF MORPHOLOGICALLY COMPLEX WORDS IN L2 SPANISH

A Method for Studying Semantic Construal in Grammatical Constructions with Interpretable Contextual Embedding Spaces

Locally Measuring Cross-lingual Lexical Alignment: A Domain and Word Level Perspective

Lexical Disambiguation in Verb Learning: Evidence from the Conjoined-Subject Intransitive Frame in English and Mandarin Chinese

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

The Scenario Refiner: Grounding subjects in images at the morphological level

Corpus Processing of Multi-Word Discourse Markers for Advanced Learners

Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers

Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks

Shades of meaning: Uncovering the geometry of ambiguous word representations through contextualised language models

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Using Language Models to Disambiguate Lexical Choices in Translation