A Survey on Zero Pronoun Translation

Longyue Wang,Siyou Liu,Mingzhou Xu,Linfeng Song,Shuming Shi,Zhaopeng Tu
2023-05-17
Abstract:Zero pronouns (ZPs) are frequently omitted in pro-drop languages (e.g. Chinese, Hungarian, and Hindi), but should be recalled in non-pro-drop languages (e.g. English). This phenomenon has been studied extensively in machine translation (MT), as it poses a significant challenge for MT systems due to the difficulty in determining the correct antecedent for the pronoun. This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution, so that researchers can recognise the current state and future directions of this field. We provide an organisation of the literature based on evolution, dataset, method and evaluation. In addition, we compare and analyze competing models and evaluation metrics on different benchmarks. We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use; 4) general-purpose metrics are not reliable on nuances and complexities of ZPT, emphasizing the necessity of targeted metrics; 5) apart from commonly-cited errors, ZPs will cause risks of gender bias.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges of zero - pronoun translation (ZPT) in machine translation. Specifically, zero pronouns (ZPs) frequently occur in languages that omit pronouns (such as Chinese, Hungarian, Hindi, etc.), but need to be restored in languages that do not omit pronouns (such as English). This phenomenon poses a significant challenge to machine - translation systems because it is very difficult to determine the correct antecedent of a pronoun. The paper aims to summarize the main work in zero - pronoun translation since the neural - network revolution, so that researchers can understand the current research status and future development directions. The paper organizes the literature from multiple perspectives such as data sets, methods, and evaluation, and comparatively analyzes competing models and evaluation metrics on different benchmarks, revealing several valuable findings, for example: 1. Zero - pronoun translation is consistent with the development trend of large - language models. 2. Data limitations lead to learning biases in languages and domains. 3. Performance improvements are usually reported on a single benchmark, but advanced methods are still far from meeting the requirements of practical applications. 4. General evaluation metrics are unreliable in the subtle differences and complexity of zero - pronoun translation, emphasizing the need for target metrics. 5. In addition to common errors, zero pronouns may also lead to the risk of gender bias. Through these findings, the paper provides directions and suggestions for future zero - pronoun - translation research.