Abstract:The wide adoption of media and social media has increased the amount of digital content to an enormous level. Natural language processing (NLP) techniques provide an opportunity to extract and explore meaningful information from a large amount of text. Among natural languages, Urdu is one of the widely used languages worldwide for spoken and written communications. Due to its wide adopt-ability, digital content in the Urdu language is increasing briskly, especially with social media and online NEWS feeds. Government agencies and advertisers must filter and understand the content to analyze the trends and cohorts in their interest and national prerogative. Clustering is considered a baseline and one of the first steps in natural language understanding. There are many state-of-the-art clustering techniques specifically for English, French, and Arabic, but no significant research has been conducted in Urdu language processing. Doing it for short text segments is challenging because of limited features and the absence of meaningful language discourse and nuance. Many rule-based NLP techniques are adopted to overcome these issues, relying on human-designed features and rules. Therefore, these methods do not promise remarkable results. Alongside NLP, deep learning techniques are pretty efficient in capturing contextual information with minimal noise compared to other traditional methods. By taking on this challenging job, we develop a deep learning-based technique for Urdu short text clustering for the very first time without a human-designed feature. In this paper, we propose a method of short text clustering using a deep neural network that automatically learns feature representations and clustering assignments simultaneously. This method learns clustering objectives by converting the high dimensional feature space to a low dimensional feature space. Our experiments on the Urdu NEWS headlines dataset show remarkable results compared to state-of-the-art methods.

EnML: Multi-label Ensemble Learning for Urdu Text Classification

Contextually Enriched Meta-Learning Ensemble Model for Urdu Sentiment Analysis

Multi-label emotion classification of Urdu tweets

Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification

A machine learning approach for Urdu text sentiment analysis

Urdu News Content Classification Using Machine Learning Algorithms

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

Multi-class sentiment analysis of urdu text using multilingual BERT

Urdu Sentiment Analysis via Multimodal Data Mining Based on Deep Learning Algorithms

Hierarchical Text Classification of Urdu News using Deep Neural Network

A deep learning approach for Named Entity Recognition in Urdu language

Supervised ensemble learning methods towards automatically filtering Urdu fake news within social media

Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis

Innovations in Urdu Sentiment Analysis Using Machine and Deep Learning Techniques for Two-Class Classification of Symmetric Datasets

A Novel Deep Auto-Encoder Based Linguistics Clustering Model for Social Text

A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis

Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media

Deep Sentiments Analysis for Roman Urdu Dataset Using Faster Recurrent Convolutional Neural Network Model

Sentiment Analysis Based on Urdu Reviews Using Hybrid Deep Learning Models

Deep-EmoRU: mining emotions from roman urdu text using deep learning ensemble