Abstract:Aim/Purpose. The most crucial aspects of teaching a foreign language to more advanced learners are building an awareness of discourse modes, how to regulate discourse, and the pragmatic properties of discourse components. However, in different languages, the connections and structure of discourse are ensured by different linguistic means which makes matters complicated for the learner. Background. By uncovering regularities in a foreign language and comparing them with patterns in one’s own tongue, the corpus research method offers the student unique opportunities to acquire linguistic knowledge about discourse markers. This paper reports on an investigation of the functions of multi-word discourse markers. Methodology. In our research, we combine the alignment model of the phrase-based statistical machine translation and manual treatment of the data in order to examine English multi-word discourse markers and their equivalents in Lithuanian and Hebrew translations by researching their changes in translation. After establishing the full list of multi-word discourse markers in our generated parallel corpus, we research how the multi-word discourse markers are treated in translation. Contribution. Creating a parallel research corpus to identify multi-word expressions used as discourse markers, analyzing how they are translated into Lithuanian and Hebrew, and attempting to determine why the translators made the choices add value to corpus-driven research and how to manage discourse. Findings. Our research proves that there is a possible context-based influence guiding the translation to choose a particle or other lexical item integration in Lithuanian or Hebrew translated discourse markers to express the rhetorical domain which could be related to the so-called phenomenon of “over-specification.” Recommendations for Practitioners. The comparative examination of discourse markers provides language instructors and translators with more specific information about the roles of discourse markers. Recommendations for Researchers. Understanding the multifunctionality of discourse markers provides new avenues for discourse marker application in translation research. Impact on Society. The current study may be a useful method to strengthen students’ language awareness and analytic skills and is particularly important for students specializing in English philology or translation. Beyond the empirical research, an extensive parallel data resource has been created to be openly used. Future Research. It should be noted that the observed phenomenon of “over-specification” could be analyzed further in future research.

The Multilingual Student Translation corpus: a resource for translation teaching and research

The undergraduate learner translator corpus: a new resource for translation studies and computational linguistics

The UPF learner translation corpus as a resource for translator training

Saudi Learner Translation Corpus: The design and compilation of an English-Arabic learner translation corpus

On the need for a new research agenda for corpus-based translation studies: a multi-methodological, multifactorial and interdisciplinary approach

The translation teaching platform based on multilingual corpora of Xi Jinping: The Governance of China: Design, resources and applications

The Construction of Computer-aided Translation Teaching Platform Based on Corpus

Practical Research on Corpus-based Translation Teaching Modes of College English

The Pedagogy of Corpus-aided English-Chinese Translation from a Critical & Creative Perspective

A Parallel Corpus of Translationese

Corpus-based Translation Studies: Examining Media Language through a Linguistic Lens

Corpus Methods for Descriptive Translation Studies

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates

Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts

Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec

Corpus Processing of Multi-Word Discourse Markers for Advanced Learners

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

AN ONLINE REPOSITORY OF PYTHON RESOURCE FOR TEACHING MACHINE TRANSLATION TO TRANSLATION STUDENTS

Beyond English-Centric Multilingual Machine Translation

Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation