Research on Cross-cultural Text Reconstruction of Urban Publicity Translation Based on Computer Corpus

Zhenli Li,Jian Tang
DOI: https://doi.org/10.1155/2022/5076637
2022-02-02
Scientific Programming
Abstract:Urban publicity translation, as a cross-cultural communication activity, should aim for communication, employ various translation strategies, adapt to the target language’s expression habits, overcome cultural differences, and make the translation easy to accept for target readers. In order to achieve the goal of external promotion, publicity texts should respect and conform to the target culture’s language expression as well as the psychology of the audience during the initial stage of urban publicity translation. This paper analyzes the causes of cultural vacancies in the translation of urban publicity materials, starting with the classification and sorting of cultural vacancies in the translation of publicity materials. This paper focuses on using a computer corpus to reconstruct cross-cultural text for urban publicity translation. An automatic corpus expansion method combined with the EM (expectation-maximization) algorithm is proposed to solve this problem. The model is iteratively trained after the generated single corpus is combined with the original data set to create a parallel corpus. Finally, as another important feature of words, the word cooccurrence degree is incorporated into the interword relationship extraction model to create a new word translation evaluation index. Finally, the experiment demonstrates that the EIWR (extraction of interword relations) has higher accuracy than the VSM (vector space model).
computer science, software engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of cross - cultural text reconstruction in urban publicity translation. Specifically, the article aims to optimize the translation of urban publicity materials through the method of computer corpora to overcome the obstacles brought by cultural and linguistic differences, making the translated text more acceptable to the target readers. ### Main problems and solutions 1. **Cross - cultural differences**: As a cross - cultural communication activity, urban publicity translation needs to adapt to the expression habits of the target language and overcome cultural differences, so that the translated content can be more easily understood and accepted by the target readers. For this reason, the article analyzes the causes of cultural gaps in the translation of publicity materials and classifies and arranges these cultural gaps. 2. **Scarcity of corpus resources**: Parallel corpora (i.e., corpora in which the source - language and target - language texts are in one - to - one correspondence) have high translation accuracy, but their construction costs are high and resources are scarce, and it is difficult to cover all research fields. Therefore, the article proposes an automatic corpus expansion method combined with the EM (Expectation - Maximization) algorithm to solve this problem. 3. **Lexical co - occurrence and inter - word relationships**: In order to further improve the translation quality, the article incorporates lexical co - occurrence (i.e., the frequency of words co - occurring in the same context) into the inter - word relationship extraction model and creates a new word translation evaluation index. 4. **Experimental verification**: The experimental results show that the method based on Extracting Inter - Word Relationships (EIWR) is superior to the traditional Vector Space Model (VSM) in accuracy, especially when dealing with high - frequency words. ### Formula representation - **Objective function of the EM algorithm**: \[ \theta^*=\arg\max_{\theta}\sum_{i = 1}^{N}\sum_{j = 1}^{|t_i|}\log p(t_{i,j}|t_{i,0:j - 1},s_i) \] where \(N\) is the size of the training corpus, and \(|t_i|\) is the length of the target - language sentence \(t_i\). - **Objective function of model iterative training**: \[ L^*(\theta_{S\rightarrow T})=\sum_{i = 1}^{N}\log p(t_i|s_i)+\sum_{i = 1}^{P}\log p(t_i) \] The first part represents the conditional probability of generating the target language \(T\) for the source language \(S\); the second part represents the language model of the target language \(T\). - **Edit distance calculation formula**: \[ D(i,j)=\begin{cases} 0, &\text{if }i = j = 0,\\ D(0,j - 1)+w(t_j), &\text{if }i = 0,j\neq0,\\ D(i - 1,0)+w(s_i), &\text{if }i\neq0,j = 0,\\ \min\left\{\begin{array}{l} D(i - 1,j)+w(s_i)\\ D(i,j - 1)+w(t_j)\\ M(i,j) \end{array}\right., &\text{otherwise} \end{cases} \] where \(w(t_j)\) represents the cost of inserting \(t_j\), \(w(s_i)\) represents the cost of deleting \(s_i\), and \(M(i,j)=\max(w(s_i),w(t_j))\) represents the cost of replacing \(s_i\) with \(t_j\). Through the above methods, this paper successfully solves the problem of cross - cultural text reconstruction in urban publicity translation and improves the accuracy and readability of translation.