Abstract:Extractive Text Summarization is the process of selecting the most representative parts of a larger text without losing any key information. Recent attempts at extractive text summarization in Bengali, either relied on statistical techniques like TF-IDF or used naive sentence similarity measures like the word averaging technique. All of these strategies suffer from expressing semantic relationships correctly. Here, we propose a novel Word pair-based Gaussian Sentence Similarity (WGSS) algorithm for calculating the semantic relation between two sentences. WGSS takes the geometric means of individual Gaussian similarity values of word embedding vectors to get the semantic relationship between sentences. It compares two sentences on a word-to-word basis which rectifies the sentence representation problem faced by the word averaging method. The summarization process extracts key sentences by grouping semantically similar sentences into clusters using the Spectral Clustering algorithm. After clustering, we use TF-IDF ranking to pick the best sentence from each cluster. The proposed method is validated using four different datasets, and it outperformed other recent models by 43.2\% on average ROUGE scores (ranging from 2.5\% to 95.4\%). It is also experimented on other low-resource languages i.e. Turkish, Marathi, and Hindi language, where we find that the proposed method performs as similar as Bengali for these languages. In addition, a new high-quality Bengali dataset is curated which contains 250 articles and a pair of summaries for each of them. We believe this research is a crucial addition to Bengali Natural Language Processing (NLP) research and it can easily be extended into other low-resource languages. We made the implementation of the proposed model and data public on \href{<a class="link-external link-https" href="https://github.com/FMOpee/WGSS" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/FMOpee/WGSS" rel="external noopener nofollow">this https URL</a>}.

BeliN: A Novel Corpus for Bengali Religious News Headline Generation using Contextual Feature Fusion

Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation

XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Enhancing image caption generation through context-aware attention mechanism

Tackling Fake News in Bengali: Unraveling the Impact of Summarization vs. Augmentation on Pre-trained Language Models

MONOVAB : An Annotated Corpus for Bangla Multi-label Emotion Detection

A Novel Word Pair-based Gaussian Sentence Similarity Algorithm For Bengali Extractive Text Summarization

Enhancing Sentiment Analysis in Bengali Texts: A Hybrid Approach Using Lexicon-Based Algorithm and Pretrained Language Model Bangla-BERT

Enhancing Bangla Fake News Detection Using Bidirectional Gated Recurrent Units and Deep Learning Techniques

Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts

Dhoroni: Exploring Bengali Climate Change and Environmental Views with a Multi-Perspective News Dataset and Natural Language Processing

BnSentMix: A Diverse Bengali-English Code-Mixed Dataset for Sentiment Analysis

BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation Techniques

Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach

Improved hybrid text summarization system using deep contextualized embeddings and statistical features

A Novel Approach to Enhance the Performance of Semantic Search in Bengali using Neural Net and other Classification Techniques

Sentiment analysis in Bengali via transfer learning using multi-lingual BERT

BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

A novel Data and Model Centric artificial intelligence based approach in developing high-performance Named Entity Recognition for Bengali Language