Bangla AI: A Framework for Machine Translation Utilizing Large Language Models for Ethnic Media

MD Ashraful Goni,Fahad Mostafa,Kerk F. Kee
2024-02-22
Abstract:Ethnic media, which caters to diaspora communities in host nations, serves as a vital platform for these communities to both produce content and access information. Rather than utilizing the language of the host nation, ethnic media delivers news in the language of the immigrant community. For instance, in the USA, Bangla ethnic media presents news in Bangla rather than English. This research delves into the prospective integration of large language models (LLM) and multi-lingual machine translations (MMT) within the ethnic media industry. It centers on the transformative potential of using LLM in MMT in various facets of news translation, searching, and categorization. The paper outlines a theoretical framework elucidating the integration of LLM and MMT into the news searching and translation processes for ethnic media. Additionally, it briefly addresses the potential ethical challenges associated with the incorporation of LLM and MMT in news translation procedures.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper primarily explores how to utilize Large Language Models (LLM) and Multilingual Machine Translation (MMT) technologies to improve the content production and information dissemination methods of media serving specific ethnic groups within the United States (i.e., "ethnic media"), with a particular focus on the Bangladeshi community in New York City. The main objectives of the paper include: 1. **Proposing an Algorithmic Framework**: The paper proposes an algorithmic framework aimed at improving the content translation and news search capabilities of ethnic media journalists using LLM and MMT technologies, specifically targeting Bangla ethnic media. 2. **Addressing Language Barriers**: By enhancing translation quality and accuracy, the paper aims to overcome the limitations of traditional translation tools (such as Google Translate) in handling large volumes of text and resource-scarce languages, thereby reducing information access barriers caused by language differences. 3. **Enhancing Information Accessibility**: The goal is to present content from mainstream English media in the native languages of target ethnic groups, thereby enhancing these groups' ability to access critical information and fostering connections between mainstream and ethnic media. 4. **Ethical Considerations**: The paper also briefly discusses the ethical challenges that may arise when applying LLM and MMT technologies in the news translation process. Through these measures, the study aims to provide comprehensive technical support for ethnic media serving marginalized communities, not limited to the Bangladeshi community. The approach is also applicable to other ethnic media, such as those serving Hispanic, Indian, and Chinese communities.