Abstract:Nowadays, several automatic text summarization (ATS) methods have been proposed for resource-rich languages, such as English, Chinese. However, resource-limited languages like Hindi realized very little attention from researchers. The lack of resources still makes the ATS task for the Hindi language a challenging and open problem. Capturing semantic features and hidden relationships among the text units are the two main characteristics of an informative summary. In the current work, we propose an ATS model based on the document vector method to explore the semantic relations existing in the document. Moreover, we suggest two algorithms: sentence ranking and summary generation based on three main characteristics including, redundancy, diversity, and compression rate to create a clear and coherent summary. The proposed model is language-independent with some language-specific preprocessing. Further, we evaluate our model on two different language datasets as literary novels in Hindi and DUC 2007 news articles in English. We apply the ROUGE metric to measure the performance of the generated summaries. Besides, we also compare the proposed model against four baseline methods: TextRank, Lexrank, Latent Semantic Analysis (LSA), and Mudasir et al. models. The overall macro-Average F-Score (18.5% for Hindi, 26% for English) for very short length summaries of sizes 5% and 15% compression rates produced by our model is higher than the baseline approaches. In case of very lengthy summaries of size 50% compression rate, our model has the highest Macro-Average values, 18% for the Hindi novels and 25% for the English news articles against all the comparison methods. From the result analysis, we perceive that the proposed model beats all the baselines from the experimental outcomes and leads to diverse, least-redundant, semantic-rich, and compressed text summary generation.

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Automatic Document Summarization Via Deep Neural Networks

An Integrated Graph Model For Document Summarization

Combining Word Embedding and Knowledge-Based Topic Modeling for Entity Summarization

Improved hybrid text summarization system using deep contextualized embeddings and statistical features

Text Summarization Based on Sentence Selection with Semantic Representation

An Exploration of Document Impact on Graph-Based Multi-Document Summarization

Chinese Text Summarization Algorithm Based on Word2vec

Document vector embedding based extractive text summarization system for Hindi and English text

Automatic multi-document summarization based on new sentence similarity measures

A hybrid machine learning model for multi-document summarization

Multi-Document Summarization Based On Two-Level Sparse Representation Model

SemSUM: Semantic Dependency Guided Neural Abstractive Summarization

Hybrid Approach for Single Text Document Summarization using Statistical and Sentiment Features

A New Approach for Multi-Document Update Summarization

Focus-Based Text Summarisation with Hybrid Embeddings

A Syntax-Augmented and Headline-Aware Neural Text Summarization Method

Efficient Two-stage Approach for Long Document Summarization

An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model.

Deep learning-based extractive text summarization with word-level attention mechanism

Leveraging Salience Analysis and Sparse Attention for Long Document Summarization