A hybrid approach for Bengali sentence validation
Juel Sikder,Prosenjit Chakraborty,Utpol Kanti Das,Krity Dhar
DOI: https://doi.org/10.1007/s10462-024-10795-2
IF: 9.588
2024-10-09
Artificial Intelligence Review
Abstract:Bengali is the official language of Bangladesh and is widely used in Bangladesh and West Bengal in India. Due to the growing accessibility of the internet and smart devices, the use of digital text material and documents in Bengali is growing with time. An automated Bengali Sentence Validation System is proposed in this study to effectively determine the correctness of sentences in such extensively available Bengali content. As far as we know, no substantial work has been done in the field of Bengali Sentence Validation utilizing deep learning approaches. Due to the lack of linguistic resources, sophisticated Natural Language Processing tools, and benchmark datasets, developing an automated Sentence Validation System for a limited-resource language like Bengali is challenging. Additionally, Bengali Sentences come in two morphological varieties (Sadhu-bhasha and Cholito-bhasha), making the validation process more challenging. The proposed automated Bengali Sentence Validation system contains the CNN-BiLSTM hybrid classifier model. As of now, there is no standard dataset for Bengali sentence validation. Due to the lack of a standard dataset, we collected Bengali sentences from different sources in Bangladesh and developed a Bengali Sentence Validation (BSV) Dataset with around 5000 labelled sentences arranged into two categories such as correct and incorrect. Experimental results demonstrate that the proposed system outperformed other classifier models and existing approaches for Bengali Sentence Validation and is able to categorize a wide range of Bengali sentences based on their correctness. The system's F1 score for the Bengali Sentence Validation is 98%.
computer science, artificial intelligence