Pashto poetry generation: deep learning with pre-trained transformers for low-resource languages

Imran Ullah,Khalil Ullah,Hamad Khan,Khursheed Aurangzeb,Muhammad Shahid Anwar,Ikram Syed
DOI: https://doi.org/10.7717/peerj-cs.2163
2024-08-30
Abstract:Generating poetry using machine and deep learning techniques has been a challenging and exciting topic of research in recent years. It has significance in natural language processing and computational linguistics. This study introduces an innovative approach to generate high-quality Pashto poetry by leveraging two pre-trained transformer models, LaMini-Cerebras-590M and bloomz-560m. The models were trained on an extensive new and quality Pashto poetry dataset to learn the underlying complex patterns and structures. The trained models are then used to generate new Pashto poetry by providing them with a seed text or prompt. To evaluate the quality of the generated poetry, we conducted both subjective and objective evaluations, including human evaluation. The experimental results demonstrate that the proposed approach can generate Pashto poetry that is comparable in quality to human-generated poetry. The study provides a valuable contribution to the field of Pashto language and poetry generation and has potential applications in natural language processing and computational linguistics.
What problem does this paper attempt to address?