Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

Rachel Bawden,François Yvon
2023-05-09
Abstract:The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.
Computation and Language
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily focuses on the performance of the multilingual model BLOOM in machine translation (MT) tasks. Specifically: 1. **Zero-shot and Few-shot Performance**: Evaluating BLOOM's machine translation performance in zero-shot (0-shot) and few-shot scenarios. 2. **Impact of Prompt Design**: Investigating the impact of different prompt designs on translation performance. 3. **Diverse Language Pairs**: Assessing the performance across various language pairs, including high-resource and low-resource languages. 4. **Ability to Utilize Language Context**: Evaluating BLOOM's ability to utilize language context during translation. The main conclusions of the paper include: - Zero-shot capability is troubled by over-generation and generating incorrect languages. - In few-shot scenarios, these issues are significantly improved, and performance approaches the current best levels for some language pairs. - There is a notable cross-lingual transfer effect, with good performance even for languages not formally present in the training data. - Although language context did not significantly improve scores, evidence suggests that BLOOM's translation is influenced by it.