A frame semantics based approach to comparative study of digitized corpus

Abdelaziz Lakhfif,Mohamed Tayeb Laskri
DOI: https://doi.org/10.48550/arXiv.2006.00113
2020-05-30
Abstract:in this paper, we present a corpus linguistics based approach applied to analyzing digitized classical multilingual novels and narrative texts, from a semantic point of view. Digitized novels such as "the hobbit (Tolkien J. R. R., 1937)" and "the hound of the Baskervilles (Doyle A. C. 1901-1902)", which were widely translated to dozens of languages, provide rich materials for analyzing languages differences from several perspectives and within a number of disciplines like linguistics, philosophy and cognitive science. Taking motion events conceptualization as a case study, this paper, focus on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels, in order to re-examine the linguistic encodings of motion events in English and Arabic in terms of Frame Semantics. The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of analyzing the expression of motion events in digital multilingual novels and narrative texts from a semantic perspective, particularly between English and Arabic. Specifically, the author focuses on the differences in how different languages conceptualize motion events, which can be described and explained through Frame Semantics. ### Main Research Questions: 1. **Description of Language Differences**: How can the differences in the conceptualization of motion events in different languages (such as English and Arabic) be described through frame structures and inter-frame relations? 2. **Comparison of Semantic Encoding**: How can Frame Semantics be used to conduct a comparative analysis of the semantic encoding of motion events in English and Arabic? 3. **Theoretical Validation**: How can Frame Semantics be used to reassess Talmy's linguistic typology and Slobin's "thinking for speaking" hypothesis? ### Research Methods: - **Corpus Construction**: Create an English-Arabic bilingual aligned corpus by selecting chapters from classic novels "The Hobbit" and "The Hound of the Baskervilles." - **Multi-layer Annotation**: Annotate the corpus texts at lexical, syntactic, and semantic levels, particularly using FrameNet principles for semantic role annotation. - **Comparative Analysis**: Conduct a detailed comparative analysis of the expression of motion events in English and Arabic based on Frame Semantics, exploring their differences in conceptualization and semantic encoding. ### Research Significance: - **Cross-linguistic Research**: Gain a deeper understanding of the commonalities and differences in how different languages express motion events through the method of Frame Semantics. - **Cognitive Linguistics**: Validate and extend Talmy and Slobin's theories, providing new evidence for the relationship between language and thought. - **Machine Translation**: Provide theoretical support for improving cross-linguistic machine translation systems, especially in addressing translation divergences in the expression of motion events. In summary, this paper aims to conduct a detailed comparative analysis of the expression of motion events in English and Arabic through the method of Frame Semantics, to validate and extend existing linguistic typology theories, and to provide new perspectives for cross-linguistic research and machine translation.