Abstract:Urdu text is a cursive script and belongs to a non-Latin family of other cursive scripts like Arabic, Chinese, and Hindi. Urdu text poses a challenge for detection/localization from natural scene images, and consequently recognition of individual ligatures in scene images. In this paper, a methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images. As a first step, the custom FasterRCNN algorithm has been used in conjunction with well-known CNNs like Squeezenet, Googlenet, Resnet18, and Resnet50 for detection and localization purposes for images of size $320times 240$ pixels. For ligature Orientation prediction, a custom Regression Residual Neural Network (RRNN) is trained/tested on datasets containing randomly oriented ligatures. Recognition of ligatures was done using Two Stream Deep Neural Network (TSDNN). In our experiments, five-set of datasets, containing 4.2K and 51K Urdu-text-embedded synthetic images were generated using the CLE annotation text to evaluate different tasks of detection, orientation prediction, and recognition of ligatures. These synthetic images contain 132, and 1600 unique ligatures corresponding to 4.2K and 51K images respectively, with 32 variations of each ligature (4-backgrounds and font 8-color variations). Also, 1094 real-world images containing more than 12k Urdu characters were used for TSDNN's evaluation. Finally, all four detectors were evaluated and used to compare them for their ability to detect/localize Urdu-text using average-precision (AP). Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98. While Squeeznet, Googlenet, Resnet18 based detectors had testing AP of.65,.88, and.87 respectively. RRNN achieved and accuracy of 79% and 99% for 4k and 51K images respectively. Similarly, for characters classif-cation in ligatures, TSDNN attained a partial sequence recognition rate of 94.90% and 95.20% for 4k and 51K images respectively. Similarly, a partial sequence recognition rate of 76.60% attained for real world-images.

Efficient Urdu Caption Generation using Attention based LSTM

Generative image captioning in Urdu using deep learning

Attention Based Sequence-to-sequence Framework for Auto Image Caption Generation

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

A Hindi Image Caption Generation Framework Using Deep Learning

Generating Image Captions using Bahdanau Attention Mechanism and Transfer Learning

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning

Large Scale Font Independent Urdu Text Recognition System

Stateful Human-Centered Visual Captioning System to Aid Video Surveillance

Bench-Marking And Improving Arabic Automatic Image Captioning Through The Use Of Multi-Task Learning Paradigm

An encoder-decoder based framework for hindi image caption generation

An Intelligent Image Caption Generator Model using Deep Learning

Machine Translation System Using Deep Learning for English to Urdu

Improved Bengali Image Captioning via Deep Convolutional Neural Network Based Encoder-Decoder Model

A deep learning approach for Named Entity Recognition in Urdu language

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Enhancing image caption generation through context-aware attention mechanism

A Novel Approach for Emotion Detection and Sentiment Analysis for Low Resource Urdu Language Based on CNN-LSTM